Gas vesicle expression systems, gas vesicle constructs and related genetic circuits, vectors, mammalian cells, hosts, compositions, methods and systems

ABSTRACT

Provided herein are genetically engineered gas vesicle expression systems (GVES) that are configured to express gas vesicles (GVs) in a mammalian cell, related gas vesicle polynucleotide constructs, gas vesicle reporting genetic circuits, vectors, genetically engineered mammalian cells, non-human mammalian hosts, compositions, methods and systems, which in several embodiments can be used together with contrast-enhanced imaging techniques to detect and report biological events in an imaging target site comprising a mammalian cell and/or organism.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims priority to U.S. Provisional ApplicationNo. 62/789,295, entitled “Mammalian Expression Of Gas Vesicles AsAcoustic Reporter Genes” filed on Jan. 7, 2019, with docket number CIT8165-P, and to U.S. Provisional Application No. 62/895,553, entitled“Burst Ultrasound Reconstruction With Signal Templates” filed on Sep. 4,2019, with docket number CIT 8337-P, both of which are incorporatedherein by reference in its entirety. The present application is alsorelated to U.S. application Ser. No. 16/736,581 entitled “BURSTUltrasound Reconstruction with Signal Templates and related Methods andSystems” filed on Jan. 7, 2020 with docket number P2443-US and PCTApplication Number PCT/US2020/012557 entitled “BURST UltrasoundReconstruction with Signal Templates and related Methods and Systems”filed on Jan. 7, 2020 with the docket number P2443-PCT, the content ofeach of which is also incorporated by reference in its entirety.

STATEMENT OF GOVERNMENT GRANT

This invention was made with government support under Grant No. EB018975and Grant No. U54CA199090 awarded by the National Institutes of Health.The government has certain rights in the invention.

FIELD

The present disclosure relates to gas-filled structures, and inparticular genetically engineered gas vesicle gene expression systems,engineered gas vesicle polynucleotide construct and related geneticcircuits, vectors, mammalian cells, hosts, compositions, methods andsystems and in particular related methods and systems to produce gasfilled structures and/or to image biological events in a target site.

BACKGROUND

Reporting biological events, such as a gene expression, proteolysis,biochemical reactions as well as cell location and function, iscurrently primarily based on fluorescent reporter genes.

Challenges remain for identifying, producing and/or developingbiocompatible reporters that can be imaged in deep tissues, enablemultiplexed imaging of biological events, are genetically modifiable,are capable of enabling detection at nanomolar concentrations and/orproduce dynamic contrast in response to local molecular signals.

SUMMARY

Provided herein are genetically engineered gas vesicle expressionsystems (GVES) that are configured to express gas vesicles (GVs) in amammalian cell. Provided herein are also related genetic circuits,vectors, genetically engineered mammalian cells, compositions, methodsand systems, which in several embodiments can be used together withultrasound and/or contrast-enhanced imaging techniques to detect andreport biological events in an imaging target site comprising amammalian cell and/or organism.

According to a first aspect, a genetically engineered Gas VesicleExpression System (GVES) is described, configured for expressing in amammalian cell, a gene cluster of gyp genes (GVGC) encoding GV proteinscapable of forming a GV type. The Gas Vesicle expression systemcomprises:

a gvpA/B gene expression cassette comprising a gvpA/B gene under controlof a mammalian promoter and additional mammalian regulatory regions in aconfiguration allowing expression of a gvpA/B protein in the mammaliancell; and

one or more additional gyp gene expression cassettes comprising the gypgenes of the GV gene cluster other than gvpB, under control of amammalian promoter and additional regulatory regions in a configurationallowing expression of the GV proteins other than the gvpA/B in themammalian cell.

In the Gas Vesicle expression system, each of the one or more additionalgyp gene expression cassette, when comprising two or more gyp genes,further comprises a separation element between the two or more gyp genesconfigured to provide a separate expression of the corresponding GVprotein;In the Gas Vesicle expression system, the GVPB cassette and the one ormore additional GVP cassettes are operably linked by regulatorysequences allowing co-expression of the GV proteins and formation of theGV type in the mammalian cell.

According to a second aspect, a Gas Vesicle Polynucleotide Construct(GVPC) is described, comprising

a single gvp gene cassette comprising

-   -   two or more gvp genes other than gvpA/B, of a GV gene cluster        encoding GV proteins configured to form a GV type,    -   a separation elements located between the two or more gvp genes;        and    -   a mammalian promoter; and    -   additional mammalian regulatory regions;        wherein the two or more gvp genes are under control of the        mammalian promoter and the additional mammalian regulatory        regions in a configuration allowing expression of GV proteins        encoded by the two or more gvp genes in the mammalian cell and        formation of the GV type in combination with a gvpA/B protein in        the mammalian cell.

According to a third aspect, a genetically engineered mammalian GasVesicle Reporting molecular component (GVRMC) is described. The gasvesicle reporting molecular component comprises

at least one of the Gas Vesicle expression system (GVES) and the GasVesicle polynucleotide construct (GVPC) herein described in which themammalian regulatory regions comprise a gas vesicle reporting (GVR)target region configured to be activated and/or inhibited by a molecularcomponent of a genetic circuit;

-   -   wherein the gvp genes and mammalian regulatory regions are in a        configuration allowing expression of GV proteins encoded by the        gvp genes through activation and/or inhibition of the gas        vesicle reporting (GVR) target region, when the genetic circuit        operates according to the circuit design in the mammalian cell.

According to a fourth aspect, a genetically engineered gas vesiclereporting (GVR) genetic circuit (GVRGC) configured for expression in amammalian cell is described. In the GVR genetic circuit molecularcomponents are connected one to another in a mammalian cell inaccordance with a circuit design by activating, inhibiting, binding orconverting reactions to form a fully connected network of interactingcomponents.

The GVR genetic circuit comprises a mammalian Gas Vesicle ReportingMolecular Component (GVRMC) herein described in a configuration in whichGV proteins encoded by the gvp genes of the GVRMC are expressed and agas vesicle (GV) type is provided when the genetic circuit operatesaccording to the circuit design.

According to a fifth aspect, a method to express a Gas Vesicles in amammalian cell is described. The method comprises introducing into themammalian cell a genetically engineered Gas Vesicle expression system(GVES) herein described for a time and under condition to allowexpression of GV proteins encoded by the gvp genes of the GVES andproduction of the Gas vesicle type in the mammalian cell.

According to a sixth aspect, a genetically engineered mammalian cell isdescribed comprising the Gas Vesicle expression system (GVES) and/or GasVesicle Polynucleotide Construct (GVPC) herein described, configured forexpression in the genetically engineered mammalian cell.

According to a seventh aspect, a method to provide a gas vesicle in amammalian host is described. The method comprises introducing into acell of the mammalian host the genetically engineered Gas Vesicleexpression system (GVES), the introducing performed for a time and undercondition to allow expression of the GV proteins encoded by the gvpgenes of the GVES and the production of the Gas Vesicle type in themammalian cell.

According to an eighth aspect, a genetically engineered non-humanmammalian host is described comprising the Gas Vesicle expression system(GVES) and/or Gas Vesicle Polynucleotide Construct (GVPC) hereindescribed, configured for expression in a mammalian cell of the GVproteins encoded by the gvp genes of the GVES and the production of theGas Vesicle type in the genetically engineered non-human mammalian host.

According to a ninth aspect, a method and system to provide agenetically engineered a mammalian cell comprising a GVR genetic circuitis described, the method comprising:

genetically engineering the mammalian cell to introduce into themammalian cell one or more genetically engineered Gas Vesicle ReportingMolecular Components (GVRMC) herein described

wherein at least one of the gvpB gene expression cassette and one ormore additional gvp gene expression cassettes comprise a gas vesiclereporting (GVR) target region configured to be activated and/orinhibited by a molecular component of the GVR genetic circuit, toprovide a Gas Vesicle Reporting Genetic Circuit (GVRGC) hereindescribed.

According to a tenth aspect, a method is described to image abiochemical event in a mammalian cell comprised in an imaging targetsite, the method comprising:

introducing into the mammalian cell a Gas Vesicle Reporting MolecularComponents (GVRMC) herein described to provide a GVR genetic circuit inwhich expression of GV proteins encoded by the gvp genes of the GVRMCand production of the GV type or an intracellular spatial translocationof the GV type occurs when the GVR genetic circuit operates according tothe circuit design in response to the biochemical event,

the introducing performed for a time and under conditions allowingexpression of the GV proteins and production of the GV type or anintracellular spatial translocation of the GV type in response to thebiochemical event; and

imaging the target site comprising the mammalian host by applying amagnetic field and/or ultrasound to obtain an MRI and/or an ultrasoundimage of the target site.

The system comprises the genetically engineered Gas Vesicle expressionsystem (GVES), Gas Vesicle Polynucleotide Construct (GVPC), Gas VesicleReporting Molecular Components (GVRMC) and/or GVR genetic circuits(GVRGC), related components and/or mammalian host cells in a combinationfor simultaneous combined or sequential use in the imaging methodsherein described.

According to an eleventh aspect, a method is described to label a targetmammalian host, the method comprising:

introducing into the mammalian cell a Gas Vesicle Reporting MolecularComponents (GVRMC) herein described to provide a GVR genetic circuit inwhich expression of GV proteins encoded by the gvp genes of the GVRMCand production of the GV type or an intracellular spatial translocationof the GV type occurs when the GVR genetic circuit operates according tothe circuit design in response to a trigger molecular component;

In the method, the introducing is performed under conditions resultingin presence of the trigger molecular component in the target mammalianhost.In some embodiments, the method can further comprise imaging the targetsite comprising the target mammalian host, by applying a magnetic fieldand/or ultrasound to obtain an MRI and/or an ultrasound image of thetarget site.The system comprises the genetically engineered GVES, GVPC, relatedpolynucleotide constructs, GVR genetic circuits, related componentsand/or mammalian host cells in a combination for simultaneous combinedor sequential use in the imaging methods herein described.

According to a twelfth aspect, a composition is described. Thecomposition comprises a genetically engineered Gas Vesicle expressionsystem (GVES), Gas Vesicle Polynucleotide Construct (GVPC), Gas VesicleReporting Molecular Components (GVRMC) and/or GVR genetic circuits(GVRGC) of the disclosure, vectors, and/or genetically engineeredmammalian cells described herein together with a suitable vehicle.

The Gas Vesicle expression system (GVES), Gas Vesicle PolynucleotideConstruct (GVPC), Gas Vesicle Reporting Molecular Components (GVRMC) GVRgenetic circuits (GVRGC), related vectors, genetically engineeredmammalian cells, compositions, methods and systems can be used inseveral embodiments for reporting biochemical events in a mammalian cellin vitro, or in vivo, and in particular can be used for non-invasivereporting of biochemical events in mammalian cells usingcontrast-enhanced imaging techniques such as MRI and/or ultrasound, twowidely available techniques with high resolution and deep tissuepenetration.

In particular, in several embodiments described herein, the Gas Vesicleexpression system (GVES), Gas Vesicle Polynucleotide Construct (GVPC),Gas Vesicle Reporting Molecular Components (GVRMC) GVR genetic circuits(GVRGC), related vectors, genetically engineered mammalian cells,compositions, methods and systems can be used to report the location ofmammalian cells configured to express one or more GV types within animaging target site, and/or sense and report one or more biochemicalevents in a mammalian cell configured to express one or more GV typeswithin an imaging target site.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can be used inseveral embodiments to allow multiplexed imaging of a mammalian cellusing parametric MRI, and differential acoustic sensitivity andbackground-free MRI when combined with ultrasound detection.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can be used inseveral embodiments to detect events such as multiple gene expression,proteolysis and/or biochemical reactions by clustering-induced changesin MRI contrast also enable the design of dynamic molecular sensors.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can be used inseveral embodiments to report biochemical events in mammalian cellsand/or host through multiplexing, multimodal MRI and/or ultrasounddetection.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can be used inseveral embodiments to produce dynamic contrast in response to localmolecular signals in mammalian cells and/or host

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can be used inseveral embodiments to provide ultrasound imaging of mammalian cellsallowing for sensitive and selective ultrasound imaging in order todetect gas vesicle-expressing cells at volumetric concentrations below0.5% in vitro, and/or to image gene expression in mammals in vivo usingultrasound.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can be used inseveral embodiments to track movement of mammalian cells in target sitesof interest such as mammalian tumor cells, immune cells, red bloodcells, and stem cells within the body of an individual or otherenvironments.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can in someembodiments be used to allow measures of fluid flows within blood andlymphatic circulation systems by detecting the spatial location of theultrasound contrast produced the by the cells in an image and trackingthe spatial changes of that contrast over time as well as measuringmovement of cells inside a tissue as will be understood by a skilledperson.

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described can be used inconnection with various applications wherein reporting of biologicalevents, labeling of mammalian cells, and/or tracking of their movementin a target site is desired.

For example, the GVES, and related GV polynucleotide constructs, GVreporting molecular components, GVR genetic circuits, vectors,genetically engineered mammalian cells, genetically engineered non-humanmammals, compositions, methods and systems herein described, can be usedfor visualization of biological events, such as a gene expression,proteolysis, biochemical reactions, such as production of signalingmolecule and ion concentration changes, as well as cell location on atarget site (e.g. tumor cells inside a host individual, such asmammalian hosts).

The GVES, and related GV polynucleotide constructs, GV reportingmolecular components, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described, can also be used indevelopmental biology, the development and monitoring of diagnostic andtherapeutic cellular agents and/or of genetic therapeutic circuits (forexample to correct or modify genetic disorders) in medical applications,as well diagnostics applications, such as monitoring of therapeuticcell/agent efficacy and safety during developmental stages and clinicalusage.

Additional exemplary applications include uses of the GVES, and relatedpolynucleotide constructs, GVR genetic circuits, vectors, geneticallyengineered mammalian cells, genetically engineered non-human mammals,compositions, methods and systems herein described in several fieldsincluding basic biology research, applied biology, bio-engineering,bio-energy, medical research, medical diagnostics, therapeutics, and inadditional fields identifiable by a skilled person upon reading of thepresent disclosure.

The details of one or more embodiments of the disclosure are set forthin the accompanying drawings and the description below. Other features,objects, and advantages will be apparent from the description anddrawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated into and constitute apart of this specification, illustrate one or more embodiments of thepresent disclosure and, together with the detailed description and theexamples, serve to explain the principles and implementations of thedisclosure.

FIG. 1 shows an exemplary Clustal omega alignment of amino acidsequences of selected exemplary gvpA and gvpB proteins (SEQ ID NO: 7-10and 457-472).

FIG. 2 shows exemplary phylogenetic relationships of the gvpA proteinsequences from the indicated prokaryotic species. [1]

FIG. 3 shows exemplary phylogenetic relationships of the gvpF and gvpLprotein sequences from the indicated prokaryotic species. [1]

FIG. 4 shows exemplary phylogenetic relationships of the gvpN proteinsequences from the indicated prokaryotic species. [1]

FIG. 5 shows diagrams illustrating the organization of exemplary gasvesicle gene clusters. Gas vesicle gene clusters from the indicatedorganisms are shown, with genes shown as block-shaped arrows, and genesof predicted similar function indicated in the same shade of grey. Thedirection of the transcription of genes within a gene cluster isindicated by the direction of the block-shaped arrows, and genes groupedtogether having block arrows pointed in the same direction are typicallyorganized in the same operon. The scale bar indicates 1 kb. [1]

FIG. 6 shows diagrams illustrating organization of exemplary gvp geneclusters, wherein each letter indicates a gvp gene, and an arrow beneatha group of letters indicates an operon, with the direction of the arrowindicating the direction of transcription. [2]

FIG. 7 illustrates the expression of an exemplary B. megaterium genecluster for gas vesicle formation. In particular, FIG. 7 top panel showsa schematic representation of bacterial gas vesicle gene clusters usedfor heterologous expression of gas vesicles in E. coli. FIG. 7 bottompanel shows representative whole cell TEM images of E. coli Rosetta2(DE3)pLysS cells after expression of gas vesicles genes for 22 hours.Scale bars represent 500 nm. Expression performed as in Farhadi et al.2018 (21) and TEM imaging as in Bourdeau et al. Nature, 2018 (13). Theresults indicate that gvpR and gvpT genes in the B. megaterium genecluster are not necessary for gas vesicle formation.

FIG. 8 shows a schematic illustration of an assay for tolerability ofP2A peptide additions. In particular, FIG. 8 provides a schematicillustration of gas vesicle gene cluster with N-terminal modifications(left) or C-terminal modifications (right) of each gene (SEQ ID NO: 479and 480) to test tolerability of P2A peptides, in a one-by-one settingsin E. coli.

FIGS. 9A-9C illustrate an exemplary identification of bottleneck geneson an exemplary polycistronic gas vesicle gene plasmid. FIG. 9A shows aschematic representation of the experimental approach. FIG. 9B shows achart reporting a qualitative estimate of the relative number of gasvesicles produced when each indicated gene was supplied solely by thepolycistronic plasmid. FIG. 9C shows representative TEM images of gasvesicles in the lysate of HEK293T cells for all 8 assays. Scale barsrepresent 500 nm.

FIGS. 10A-10B illustrate testing of regulatory genes in a geneticconstruct and sorting of resulting cell line. FIG. 10A shows a schematicrepresentation of a genetic construct including exemplary regulatoryregions usable in polynucleotide constructs of the present disclosure.FIG. 10B shows a diagram reporting FACS of mCherry cells, with selectedcells indicated with dark gray dots.

FIG. 11 illustrates results of fluorescence activated cell sorting ofHEK293-tetON and CHO-tetON cells transfected with integrating mARGconstructs herein described. FIG. 11 Panel A shows a schematicrepresentation of the integrating constructs used to generate polyclonalcell lines. FIG. 11 Panel B shows a chart illustrating FACS ofmARG-expressing HEK293-tetON cells. The cells are sorted for each group(subtype 1, subtype 2, subtype, 3, subtype 4) as indicated with theremaining smaller gray dots indicating unsorted population. FIG. 11Panel C shows a chart illustrating the relative fluorescence of the fourpolyclonal subtypes sorted. Dark gray bars indicate mCherry expression;light gray bars indicate EmGFP and eBFP2 expression. FIG. 11 Panel Dshows a chart reporting the approximate gas vesicle yield frompolyclonal cells in each subtype. FIG. 11 Panel E shows a chartreporting FACS of mARG-expressing CHO-tetON cells. Dark gray dataindicate cells sorted in subtype 1 and small light gray dots areunsorted cells. FIG. 11 Panel F shows representative TEM image ofbuoyancy-enriched lysate from CHO-tetON cells sorted as indicated inFIG. 11 Panel E. Scale bar represents 500 nm. FIG. 11 Panel G shows achart reporting the approximate gas vesicle yield for the sortedmARG-expressing CHO-tetON cells.

FIG. 12 illustrates an approach for engineering a mammalian cell throughtransformation of the cell with an exemplary GVES of the disclosure. Inparticular FIG. 12 Panel A shows a schematic illustration of thetransient co-transfection assay used to identify combinations of genescapable of producing gas vesicles in mammalian cells. FIG. 12 Panel Bshows a schematic representation of nine expression cassettes comprisinggenes from B. megaterium capable of encoding gas vesicle expression inmammalian cells. Thin arrow denotes CMV promoter. polyA denotes SV40polyadenylation element. FIG. 12 Panel C shows Representative TEM imageof purified gas vesicles expressed in HEK293T cells. FIG. 12 Panel Dshows a schematic representation of gene cassettes comprising themammalian acoustic reporter gene construct, mARG. FIG. 12 Panel E showsrepresentative TEM image of gas vesicles purified from HEK293T cellstransiently transfected with mARGs for 72 hours. All scale barsrepresent 500 nm.

FIG. 13 illustrates formation, properties and non-toxicity of gasvesicles in cells with genome-integrated mammalian acoustic reportergenes. FIG. 13 Panel A show a schematic representation of mARGconstructs used for genomic integration into cells with the piggyBactransposase system. ITR, inverted terminal repeat; ChβGI, Chickenbeta-globin insulator; GFP, Emerald green fluorescent protein; BFP,enhanced blue fluorescent protein 2. FIG. 13 Panel B showsrepresentative TEM image of buoyancy-enriched lysate from HEK293-tetONcells transfected with the constructs in FIG. 13 Panel A and sorted forhigh expression of all three operons. FIG. 13 Panel C showsfluorescence-activated cell sorting of HEK293-tetON cells transfectedwith the constructs in FIG. 13 Panel A. Large gray circles denoteindividual cells selected by sorting to form monoclonal cell lines. FIG.13 Panel D shows a flowchart illustrating a selection process formonoclonal cell lines, including assays for viability, fluorescenceintensity and gas vesicle yield. FIG. 13 Panel E shows a chartillustrating the number of gas vesicles expressed by monoclonalHEK293-tetON cells after 72 hours of induced expression, as counted inlysates using TEM. Bar represents the mean and the shaded arearepresents SEM (n=3, each from two technical replicates). FIG. 13 PanelF shows Representative TEM image of a 60-nm section through an mARG-HEKcell showing an angled slice through two bundles of gas vesicles in thecytosol. FIG. 13 Panel G shows representative TEM image of gas vesiclespurified from mARG-HEK cells. FIG. 13 Panel H shows Size distribution ofgas vesicles expressed in mARG-HEK cells. The mean and standarddeviation of both distributions is illustrated as a circle and witherror bars. (n=1828) FIG. 13 Panel I shows phase contrast images ofmARG-HEK and mCherry-HEK cells 72 hours after induction with 1 μg/mLdoxycycline and 5 mM sodium butyrate. FIG. 13 Panel J shows a diagramreporting cell viability of mARG-HEK cells relative to mCherry-HEK cellsafter 72 hours of gene expression. Error bars indicate SEM. FIG. 13Panel K shows a chart reporting a fraction of mARG-HEK cells inco-culture with mARG-mCherry cells seeded in equal numbers over 6 daysof gene expression (n=3 biological replicates, each from 4 technicalreplicates, with darker symbols showing the mean). Scale bars in B, F, Grepresent 500 nm. Scale bar in I represents 20 μm.

FIG. 14 illustrates an exemplary ultrasound imaging of mammalian geneexpression in vitro. FIG. 14 Panel A shows a schematic illustration ofthe collapse-based ultrasound imaging paradigm used to generate gasvesicle-specific ultrasound contrast from mARG-expressing cells. FIG. 14Panel B shows a chart reporting a representative non-linear signalrecorded during a step change in the incident acoustic pressure, from0.27 MPa in the white-shaded region to 1.57 MPa in the grey-shadedregion, exemplifying BURST ultrasound imaging. FIG. 14 Panel C shows agrayscale version of representative collapse and post-collapseultrasound images of mARG-HEK and mCherry-HEK cells acquired during thisultrasound imaging paradigm and their difference, indicating gasvesicle-specific contrast. FIG. 14 Panel D shows a chart reportingcellular viability after being insonated under 3.2 MPa acousticpressures, as measured using the MTT assay. FIG. 14 Panel E shows aschematic representation of a chemically inducible gene circuit withmARG expression as its output. All three mARG cassettes in mARG-HEKcells are under the control of the doxycycline-inducible TRE3G promoter(TRE), with expression triggered by incubation with doxycycline. FIG. 14Panel F shows a grayscale version of representative ultrasound imagesand contrast measurements in mARG-HEK cells as a function of timefollowing induction with 1 μg/mL of doxycycline and 5 mM sodium butyrate(n=6, with the darker dots showing the mean). FIG. 14 Panel G shows agrayscale version of representative ultrasound images and contrastmeasurements in mARG-HEK cells as a function of doxycycline inductionconcentrations. Cells were allowed to express gas vesicles for 72 hoursin the presence of 5 mM sodium butyrate. (n=6, with the darker dotsshowing the mean). A sigmoidal function is fitted as a visual guide.FIG. 14 Panel H shows a grayscale version of representative ultrasoundimages and contrast measurements in mARG-HEK cells mixed withmCherry-HEK cells in varying proportions. Cells were induced with 1μg/mL of doxycycline and 5 mM sodium butyrate for 72 hours prior toimaging. (n=4, with the darker dots showing the mean) FIG. 14 Panel Ishows schematic representative and a grayscale version of representativeultrasound images from mARG-HEK cells in Matrigel re-expressing gasvesicles after acoustic collapse. Cells were induced with 1 μg/mL ofdoxycycline and 5 mM sodium butyrate for 72 hours before and after 3.2MPa acoustic insonation. Ultrasound images were acquired after anadditional 72 hours in culture following collapse. FIG. 14 Panel J showsa chart reporting results of ultrasound contrast in mARG-HEK andmCherry-HEK cells after initial expression, after collapse, afterre-expression and after second collapse. (n=7, with the darker dotsshowing the mean). GV, gas vesicles. All scale bars represent 1 mm.

FIG. 15 illustrates an exemplary ultrasound imaging of mammalian geneexpression in vivo. FIG. 15 Panel A shows a schematic illustration of anapproach wherein a mouse implanted with a subcutaneous tumor model, andthe related expected spatial pattern of vascularization anddoxycycline-induced reporter gene expression. FIG. 15 Panel B shows achart reporting an exemplary experimental timeline. FIG. 15 Panel Cshows a grayscale version of representative ultrasound image of tumorscontaining mARG-HEK cells after 4 days of doxycycline administration,arrow indicates mARG-specific BURST ultrasound image. mARG-specificcontrast shown in the grayscale version of the hot colormap is overlaidon an anatomical B-mode image showing the background anatomy. FIG. 15Panel D shows a grayscale version of representative ultrasound image oftumors containing mCherry-HEK cells after 4 days of doxycyclineadministration. FIG. 15 Panel E shows a grayscale version of ultrasoundimages of adjacent planes in the mARG-HEK tumor acquired at 1 mmintervals. The minimum and maximum values of scale bars in the originalultrasound images of Panels C-E are 4000 and 40000 au, respectively.FIG. 15 Panel F shows a grays scale version of representativefluorescence image of a histological tissue section of a mARG-HEK tumor.The light gray color shows the GFP and mCherry fluorescence around theperiphery of the tumor. FIG. 15 Panel G shows a grayscale version of afluorescence image of a mouse implanted with mARG-HEK and mCherry-HEKtumors on the left and right flanks, respectively, as outlined witharrows, after 4 days of expression. Scale bars for are 1 mm for C-F and1 cm for G.

FIG. 16 shows a graph illustrating the co-culture of reporter geneexpressing cells with HEK293T cells. Fraction of mARG-HEK cells inco-culture with HEK293T cells (circle) or mARG-mCherry cells inco-culture with HEK293T cells (square) seeded in equal numbers over 6days of gene expression (n=3 biological replicates, each from 4technical replicates, with darker dots showing the mean).

FIG. 17 shows fluorescence measurements of gene expression as a functionof time and inducer concentration in mARG-HEK cells. FIG. 17 Panel Ashows a chart illustrating mCherry fluorescence of mARG-HEK cellsinduced with 1 μg/mL doxycycline and 5 mM sodium butyrate at theindicated times after induction (n=4, with the darker dots showing themean). FIG. 17 Panel B shows a chart reporting mCherry fluorescence ofmARG-HEK cells with the indicated inducer concentration and 5 mM sodiumbutyrate after 72 hours of induction (n=7, with the darker dots showingthe mean).

FIG. 18 shows a chart illustrating a relative ultrasound contrastproduced by mARG-HEK cells in hydrogel as a function of the estimatedaverage number of gas vesicles (GV) per nanoliter gray circle symbolsrepresent results from mARG-HEK cells induced with 1 μg/mL doxycyclinefor 3 days (producing on average 45 gas vesicles per cell) mixed withmCherry-HEK cells (expressing no gas vesicles) in varying proportions,as presented in FIG. 14 Panel H. Square Gray symbols represent resultsfrom mARG-HEK cells induced with 0.01, 0.05, 0.1 and 1 μg/mL doxycyclinefor 3 days; expressing on average 0.01±0.004, 1.4±0.4, 3.5±0.3, 45±5.1(mean±SEM) gas vesicles per cell, respectively, as quantified by TEM . .. Dark symbols show the mean of ultrasound contrast for 4 replicates.Error bars represent SEM of 4 biological replicates for 0.01, 0.05, 0.1μg/mL induction and n=3 biological replicates (each from two technicalreplicates) for 1 μg/mL samples.

FIG. 19 shows exemplary in vivo ultrasound images of adjacent planes inmARG-HEK tumors acquired at 1 mm intervals. For each imaging slice thedifference heatmap of nonlinear signal between frame 1 and frame 4 isoverlaid on grayscale anatomical scale. Minimum and maximum values ofcolor bar are 4000 and 40000, respectively. White arrows indicatelocation of mARG-specific BURST ultrasound signal. Scale bars are 1 mm.

FIG. 20 shows representative Doppler ultrasound images of tumorscontaining mARG-HEK cells. Doppler ultrasound images were acquired using250 frames of ultrafast planewaves at 25V and used to reconstructvascular maps plotted as normalized power doppler signal overlaid onanatomical images in grayscale. White arrows indicate location ofvasculature around the tumor and not in the core of the tumor as seen byDoppler ultrasound. Scale bars represent 1 mm.

FIG. 21 shows representative histology sections of tumors containingmARG-HEK cells. For each mouse, two neighboring sections are presented.The light gray color shows the GFP and mCherry fluorescence around theperiphery of the tumor.

FIG. 22 shows biological replicates of in vivo ultrasound imaging ofgene expression. In particular, in FIG. 22 Panel A, the left columnshows ultrasound images of tumors containing mARG-HEK cells after 4 daysof doxycycline administration. The right column shows ultrasound imagesof tumors containing mCherry-HEK cells after 4 days of doxycyclineadministration. After imaging the tumors were insonated with 3.2 MPa ofultrasound to collapse the expressed gas vesicles. In FIG. 22 Panel B,the left column shows ultrasound images of tumors containing mARG-HEKcells re-expressing gas vesicles after an additional 4 days ofdoxycycline administration. The right column shows ultrasound images oftumors containing mCherry-HEK cells after an additional 4 days ofdoxycycline administration. Difference heatmap of nonlinear signalbetween frame 1 and frame 4 is overlaid on a grayscale anatomicalultrasound image. Min and max on color bar represent 4000 and 40000,respectively. White arrows indicate location of mARG-specific BURSTultrasound signal. Scale bars represent 1 mm.

FIG. 23 shows an exemplary configuration of a construct designed toallow expression of two different GV types in one prokaryotic cell.

FIGS. 24A-24D illustrate an consolidated mARG construct comprising 2gene cassettes enabling mammalian gas vesicle expression. FIG. 24A showsa schematic representation of two gene cassettes integrated to thegenome of HEK293-tetON cells. In the top construct gvpB is separatedfrom gvpN by an internal ribosome entry sequence (shown as box betweengvpB and gvpN). The promoters, as illustrated by thin arrows are TRE3Gdoxycycline-inducible promoters. FIG. 24B shows representative TEM imageof GVs in the lysate of HEK293-tetON cells transfected with theconstructs in (FIG. 24A) and induced with 1 μg/mL doxycycline. FIG. 24Cillustrates an alternative consolidated mARG construct comprising of 2gene cassettes enabling mammalian GV expression. In the top constructgvpB is separated from gvpF by an IRES. The promoters, as illustrated bythin arrows are CMV promoters. FIG. 24D shows a representative BUSTultrasound of HEK293T cells expressing the constructs in FIG. 24C.HEK293T control without GV genes do not produce BURST ultrasound signal.

FIG. 25A shows HEK293T cells transfected with Ana-gpvA and theconstructs in table 13 and table 14. After 72 hours of expressionrepresentative BURST ultrasound signal is quantified. HEK293T controlwithout GV genes do not produce BURST ultrasound signal.

FIG. 25B shows Ana-gvpA, Ana-gvpC, Ana-gvpN from Table 10 together withB. megaterium GVS genes from Table 8. HEK293T cells expressing thesehybrid genes were able to produce gas vesicles as detectable by BURSTultrasound imaging.

FIG. 26A shows HEK293T cells that have been transfected with Ana-gvpA,Ana-gvpC, Ana-gvpN, Ana-gpvJ, Ana-gvpK, Ana-gvpF, Ana-gvpG, Ana-gvpV,Ana-gvpW, and after 72 hours imaged with BURST ultrasound imaging. FIG.26B shows HEK293T cells that have been transfected with Ana-gvpA,Ana-gvpN, Ana-gpvJ, Ana-gvpK, Ana-gvpF, Ana-gvpG, Ana-gvpV, Ana-gvpW,and after 72 hours cell lysate imaged with TEM. FIG. 26C shows HEK293Tcells that have been transfected with Ana-gvpA, Ana-gvpN, Ana-gpvJ,Ana-gvpK, Ana-gvpF, Ana-gvpG, Ana-gvpW, and after 72 hours cell lysateimaged with TEM. FIG. 26D shows HEK293T cells that have been transfectedwith Ana-gvpA, Ana-gpvJ, Ana-gvpK, Ana-gvpF, Ana-gvpG, Ana-gvpW, andafter 72 hours cell lysate imaged with TEM. White arrows indicate smallgas vesicle particles. FIG. 26E shows HEK293T cells transfected with AnaGV genes with gene sequences acquired from the NCBI database (denotedans Ana NCBI gvpG) and GV genes with gene sequences sequenced directlyfrom native GV-expressing Anabaena flos-aquae cells. RepresentativeBURST ultrasound images were quantified.

FIG. 27 shows HEK293T cells transfected with Ana GV genes from Table 10.Cells transfected with the constructs expressed GV proteins for 72 hoursbefore ultrasound imaging. FIG. 27 panel A shows representative BURSTultrasound images of HEK293T cells expressing Ana-gvpA, Ana-gvpC,Ana-gvpN, Ana-gpvJ, Ana-gvpK, Ana-gvpF, Ana-gvpG, Ana-gvpV, Ana-gvpW onthe left and Ana-gvpA, Ana-gvpN, Ana-gpvJ, Ana-gvpK, Ana-gvpF, Ana-gvpG,Ana-gvpV, Ana-gvpW on the right. FIG. 27 panel B shows representativenonlinear signals with amplitude modulation ultrasound images of HEK293Tcells expressing Ana-gvpA, Ana-gvpC, Ana-gvpN, Ana-gpvJ, Ana-gvpK,Ana-gvpF, Ana-gvpG, Ana-gvpV, Ana-gvpW on the left and Ana-gvpA,Ana-gvpN, Ana-gpvJ, Ana-gvpK, Ana-gvpF, Ana-gvpG, Ana-gvpV, Ana-gvpW onthe right.

DETAILED DESCRIPTION

Provided herein are genetically engineered gas vesicle expressionsystems (GVES) and related polynucleotide constructs configured forexpression of a gas vesicle (GV) in a mammalian cell, and related gasvesicle gene clusters, gas vesicles, genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems.

The wordings “gas vesicles”, GV”, “gas vesicles protein structure”, or“GVPS”, refer to a gas-filled protein structure natively intracellularlyexpressed by certain bacteria or archaea as a mechanism to regulatecellular buoyancy in aqueous environments [3]. In particular, gasvesicles are protein structures natively expressed almost exclusively inmicroorganisms from aquatic habitats, to provide buoyancy by loweringthe density of the cells [3]. GVs have been found in over 150 species ofprokaryotes, comprising cyanobacteria and bacteria other thancyanobacteria [4, 5], from at least 5 of the 11 phyla of bacteria and 2of the phyla of archaea described by Woese (1987) [6]. Exemplarymicroorganisms expressing or carrying gas vesicle protein structuresand/or related genes include cyanobacteria such as Microcystisaeruginosa, Aphanizomenon flos aquae Oscillatoria agardhii, Anabaena,Microchaete diplosiphon and Nostoc; phototropic bacteria such asAmoebobacter, T hiodiclyon, Pelodiclyon, and Ancalochloris; nonphototropic bacteria such as Microcyclus aquaticus; Gram-positivebacteria such as Bacillus megaterium Gram-negative bacteria such asSerratia; and archaea such as Haloferax mediterranei, Methanosarcinabarkeri, and Halobacteria salinarium, as well as additionalmicroorganisms identifiable by a skilled person.

In particular, a GV in the sense of the disclosure is an intracellularlyexpressed structure forming a hollow structure wherein a gas is enclosedby a protein shell, which is a shell substantially made of protein (atleast 95% protein). In gas vesicles in the sense of the disclosure, theprotein shell is formed by a plurality of proteins herein also indicatedas GV proteins or gvps, which form in the cytoplasm a gas permeable andliquid impermeable protein shell configuration encircling gas.Accordingly, a protein shell of a GV is permeable to gas but not tosurrounding liquid such as water. In particular, GV protein shellsexclude water but permit gas to freely diffuse in and out from thesurrounding media [7] making them physically stable despite their usualnanometer size, unlike microbubbles, which trap pre-loaded gas in anunstable configuration.

GV structures are typically nanostructures with widths and lengths ofnanometer dimensions (in particular with widths of 45-250 nm and lengthsof 100-800 nm) but can have lengths up to 2 m in prokaryotes but canhave larger dimensions such as up to 8-10 μm as will be understood by askilled person upon reading of the present disclosure. In certainembodiments, the gas vesicles protein structure have average dimensionsof 1000 nm or less, such as 900 nm or less, including 800 nm or less, or700 nm or less, or 600 nm or less, or 500 nm or less, or 400 nm or less,or 300 nm or less, or 250 nm or less, or 200 nm or less, or 150 nm orless, or 100 nm or less, or 75 nm or less, or 50 nm or less, or 25 nm orless, or 10 nm or less. For example, the average diameter of the gasvesicles may range from 10 nm to 1000 nm, such as 25 nm to 500 nm,including 50 nm to 250 nm, or 100 nm to 250 nm. By “average” is meantthe arithmetic mean.

GVs in the sense of the disclosure have different shapes depending ontheir genetic origins [7]. For example, GVs in the sense of thedisclosure can be substantially spherical, ellipsoid, cylindrical, orhave other shapes such as football shape or cylindrical with cone shapedend portions depending on the type of bacteria providing the gasvesicles.

Representative examples of endogenously expressed GVs native tobacterial or archaeal species are the gas vesicle protein structureproduced by the Cyanobacterium Anabaena flos-aquae (Ana GVs) [3], andthe Halobacterium Halobacterium salinarum (Halo GVs) [8]. In particular,Ana GVs are cone-tipped cylindrical structures with a diameter ofapproximately 140 nm and length of up to 2 m and in particular 200-800nm or longer. Halo GVs are typically spindle-like structures with amaximal diameter of approximately 250 nm and length of 250-600 nm.

In bacteria or archaea expressing GVs, the genes (herein also gvp genes)encoding for the proteins forming the GVs (herein also GV proteins), areorganized in a gas vesicle gene cluster of 8 to 14 different genesdepending on the host bacteria or archaea, as will be understood by askilled person.

The term “Gas Vesicle Genes Cluster” or “GVGC” as described hereinindicates a gene cluster encoding a set of GV proteins capable ofproviding a GV upon expression within a bacterial or archaeal cell Sincethe ability of expressed GV proteins to assemble in a GV depends on thecell environment where GV proteins are expressed and a same group of gvpgenes may or may not form a GV upon expression in a cell, gvp genesprovide GVGCs in a cell dependent manner as will be understood by askilled person (see on point U.S. application Ser. No. 15/663,635published as US 2018/0030501).

The term “gene cluster” as used herein means a group of two or moregenes found within an organism's DNA that encode two or morepolypeptides or proteins, which collectively share a generalizedfunction or are genetically regulated together to produce a cellularstructure and are often located within a few thousand base pairs of eachother. The size of gene clusters can vary significantly, from a fewgenes to several hundred genes [9]. Portions of the DNA sequence of eachgene within a gene cluster are sometimes found to be similar oridentical; however, the resulting protein of each gene is distinctivefrom the resulting protein of another gene within the cluster. Genesfound in a gene cluster can be observed near one another on the samechromosome or native plasmid DNA, or on different, but homologouschromosomes. An example of a gene cluster is the Hox gene, which is madeup of eight genes and is part of the Homeobox gene family. In the senseof the disclosure, gene clusters as described herein also comprise gasvesicle gene clusters, wherein the expressed proteins thereof togetherare able to form gas vesicles.

The term “gene” as used herein indicates a polynucleotide encoding for aprotein that in some instances can take the form of a unit of genomicDNA within a bacteria, plant, or other organism. The term gene as usedherein includes naturally occurring polynucleotide encoding for aprotein as well as engineered polynucleotide whose sequences have beenmodified from the original sequence for example to optimize expression,e.g. through codon changes (see Examples section) and/or throughintroduction of modified N- and/or C-terminal modifications, while stillmaintaining the ability to encode for the protein encoded by thenaturally occurring polynucleotide or a or a functional variant thereof.

The term “polynucleotide” as used herein indicates an organic polymercomposed of two or more monomers including nucleotides, nucleosides oranalogs thereof. The term “nucleotide” refers to any of severalcompounds that consist of a ribose or deoxyribose sugar joined to apurine or pyrimidine base and to a phosphate group and that are thebasic structural units of nucleic acids. The term “nucleoside” refers toa compound (as guanosine or adenosine) that consists of a purine orpyrimidine base combined with deoxyribose or ribose and is foundespecially in nucleic acids. The term “nucleotide analog” or “nucleosideanalog” refers respectively to a nucleotide or nucleoside in which oneor more individual atoms have been replaced with a different atom or awith a different functional group. Accordingly, the term polynucleotideincludes nucleic acids of any length, and in particular DNA RNA analogsand fragments thereof.

The term “protein” as used herein indicates a polypeptide with aparticular secondary and tertiary structure that can interact withanother molecule and in particular, with other biomolecules includingother proteins, DNA, RNA, lipids, metabolites, hormones, chemokines,and/or small molecules. The term “polypeptide” as used herein indicatesan organic linear, circular, or branched polymer composed of two or moreamino acid monomers and/or analogs thereof. The term “polypeptide”includes amino acid polymers of any length including full-lengthproteins and peptides, as well as analogs and fragments thereof. Apolypeptide of three or more amino acids is also called a proteinoligomer, peptide, or oligopeptide. In particular, the terms “peptide”and “oligopeptide” usually indicate a polypeptide with less than 100amino acid monomers. In particular, in a protein, the polypeptideprovides the primary structure of the protein, wherein the term “primarystructure” of a protein refers to the sequence of amino acids in thepolypeptide chain covalently linked to form the polypeptide polymer. Aprotein “sequence” indicates the order of the amino acids that form theprimary structure. Covalent bonds between amino acids within the primarystructure can include peptide bonds or disulfide bonds, and additionalbonds identifiable by a skilled person. Polypeptides in the sense of thepresent disclosure are usually composed of a linear chain of alpha-aminoacid residues covalently linked by peptide bond or a synthetic covalentlinkage. The two ends of the linear polypeptide chain encompassing theterminal residues and the adjacent segment are referred to as thecarboxyl terminus (C-terminus) and the amino terminus (N-terminus) basedon the nature of the free group on each extremity. Unless otherwiseindicated, counting of residues in a polypeptide is performed from theN-terminal end (NH₂-group), which is the end where the amino group isnot involved in a peptide bond to the C-terminal end (—COOH group) whichis the end where a COOH group is not involved in a peptide bond.Proteins and polypeptides can be identified by x-ray crystallography,direct sequencing, immunoprecipitation, and a variety of other methodsas understood by a person skilled in the art. Proteins can be providedin vitro or in vivo by several methods identifiable by a skilled person.In some instances where the proteins are synthetic proteins in at leasta portion of the polymer two or more amino acid monomers and/or analogsthereof are joined through chemically-mediated condensation of anorganic acid (—COOH) and an amine (—NH₂) to form an amide bond or a“peptide” bond.

As used herein the term “amino acid”, “amino acid monomer”, or “aminoacid residue” refers to organic compounds composed of amine andcarboxylic acid functional groups, along with a side-chain specific toeach amino acid. In particular, alpha- or α-amino acid refers to organiccompounds composed of amine (—NH2) and carboxylic acid (—COOH), and aside-chain specific to each amino acid connected to an alpha carbon.Different amino acids have different side chains and have distinctivecharacteristics, such as charge, polarity, aromaticity, reductionpotential, hydrophobicity, and pKa. Amino acids can be covalently linkedto form a polymer through peptide bonds by reactions between the aminegroup of a first amino acid and the carboxylic acid group of a secondamino acid. Amino acid in the sense of the disclosure refers to any ofthe twenty naturally occurring amino acids, non-natural amino acids, andincludes both D an L optical isomers.

In embodiments herein described identification of a gene clusterencoding GV proteins naturally expressed in bacteria or archaea asdescribed herein can be performed for example by isolating the GVs fromthe bacteria or archaea, isolating the protein for the protein shell ofthe GV and deriving the related amino acidic sequence with methods andtechniques identifiable by a skilled person (see e.g. proceduresdescribed in [10] [11]). The sequence of the genes encoding for the GVproteins can then be identified by methods and techniques identifiableby a skilled person. For example, gas vesicle gene clusters can also beidentified by persons skilled in the art by performing gene sequencingor partial- or whole-genome sequencing of organisms using wet lab and insilico molecular biology techniques known to those skilled in the art.As understood by those skilled in the art, gas vesicle gene clusters canbe located on the chromosomal DNA or native plasmid DNA ofmicroorganisms. After performing DNA or cDNA isolation from amicroorganism, the polynucleotide sequences or fragments thereof orPCR-amplified fragments thereof can be sequenced using DNA sequencingmethods such as Sanger sequencing, DNASeq, RNASeq, whole genomesequencing, and other methods known in the art using commerciallyavailable DNA sequencing reagents and equipment, and then the DNAsequences analyzed using computer programs for DNA sequence analysisknown to skilled persons.

In some embodiments, identification of a gene cluster encoding for GVproteins [8, 12, 13] can also be performed by screening DNA sequencedatabases such as GenBank, EMBL, DNA Data Bank of Japan, and others. Gasvesicle gene cluster gene sequences in databases such as those above canbe searched using tools such as NCBI Nucleotide BLAST and the like, forgas vesicle gene sequences and homologs thereof, using gene sequencequery methods known to those skilled in the art. For example, genes ofthe gene cluster for the exemplary haloarchael GVs (which have thelargest number of different gvp genes) and their predicted function andfeatures are illustrated in Example 26 of related U.S. application Ser.No. 15/613,104, filed on Jun. 2, 2017 which is incorporated herein byreference in its entirety. GV gene clusters can also be identified usinga combination of genomic vicinity (e.g. antiSMASH), protein homology andprior GV gene annotation as will be understood by a skilled person.

A GV gene cluster encoding for GV proteins typically comprises GasVesicle Assembly (GVA) genes and Gas Vesicle Structural (GVS) genes.

The term Gas Vesicle Structural (GVS) proteins as used herein indicatesproteins forming part of a gas-filled protein structure intracellularlyexpressed by certain bacteria or archaea and can be used as a mechanismto regulate cellular buoyancy in aqueous environments [7]. Inparticular, GVS shell comprises a GVS identified as gvpA or gvpB (hereinalso referred to as gvpA/B) and optionally also a GVS identified asgvpC.

In particular, gvpB gene is a gene encoding for gas vesicle structuralprotein B. gvpB genes is highly homologous to gvpA gene encoding for gasvesicle structural protein A. A gvp A/B is a protein of the GV shellthat has a higher than 60% and possibly higher than 70% identity to thefollowing consensus sequence:SSSLAEVLDRILDKGXVIDAWARVSLVGIEILTIEARVVIASVDTYLR (SEQ ID NO: 3) whereinX can be any amino acid. In particular in a gvpA/B of prokaryotes, theconsensus sequence of SEQ ID NO: 3 typically forms a conserved secondarystructure having an alpha-beta-beta-alpha structural motif formed byportions of the consensus sequence comprising the amino acids LDRILD(SEQ ID NO:4) having an alpha helical structure, RILDKGXVIDAWARVS (SEQID NO:5) wherein X can be any amino acid, having a beta strand, betastrand structure, and DTYLR (SEQ ID NO:6) having an alpha helicalstructure, as will be understood by a skilled person.

As used herein, “homology”, “sequence identity” or “identity” in thecontext of two nucleic acid or polypeptide sequences makes reference tothe nucleotide bases or residues in the two sequences that are the samewhen aligned for maximum correspondence over a specified comparisonwindow. When percentage of sequence identity or similarity is used inreference to proteins, it is recognized that residue positions which arenot identical often differ by conservative amino acid substitutions,where amino acid residues are substituted with a functionally equivalentresidue of the amino acid residues with similar physiochemicalproperties and therefore do not change the functional properties of themolecule.

A functionally equivalent residue of an amino acid used herein typicallyrefers to other amino acid residues having physiochemical andstereochemical characteristics substantially similar to the originalamino acid. The physiochemical properties include water solubility(hydrophobicity or hydrophilicity), dielectric and electrochemicalproperties, physiological pH, partial charge of side chains (positive,negative or neutral) and other properties identifiable to a personskilled in the art. The stereochemical characteristics include spatialand conformational arrangement of the amino acids and their chirality.For example, glutamic acid is considered to be a functionally equivalentresidue to aspartic acid in the sense of the current disclosure.Tyrosine and tryptophan are considered as functionally equivalentresidues to phenylalanine. Arginine and lysine are considered asfunctionally equivalent residues to histidine.

A person skilled in the art would understand that similarity betweensequences is typically measured by a process that comprises the steps ofaligning the two polypeptide or polynucleotide sequences to form alignedsequences, then detecting the number of matched characters, i.e.characters similar or identical between the two aligned sequences, andcalculating the total number of matched characters divided by the totalnumber of aligned characters in each polypeptide or polynucleotidesequence, including gaps. The similarity result is expressed as apercentage of identity.

As used herein, “percentage of sequence identity” means the valuedetermined by comparing two optimally aligned sequences over acomparison window, wherein the portion of the polynucleotide sequence inthe comparison window may comprise additions or deletions (gaps) ascompared to the reference sequence (which does not comprise additions ordeletions) for optimal alignment of the two sequences. The percentage iscalculated by determining the number of positions at which the identicalnucleic acid base or amino acid residue occurs in both sequences toyield the number of matched positions, dividing the number of matchedpositions by the total number of positions in the window of comparison,and multiplying the result by 100 to yield the percentage of sequenceidentity.

As used herein, “reference sequence” is a defined sequence used as abasis for sequence comparison. A reference sequence may be a subset orthe entirety of a specified sequence; for example, as a segment of afull-length protein or protein fragment. A reference sequence cancomprise, for example, a sequence identifiable a database such asGenBank and UniProt and others identifiable to those skilled in the art.

As understood by those skilled in the art, determination of percentidentity between any two sequences can be accomplished using amathematical algorithm. Non-limiting examples of such mathematicalalgorithms are the algorithm of Myers and Miller [14], the localhomology algorithm of Smith et al. [15]; the homology alignmentalgorithm of Needleman and Wunsch [16]; the search-for-similarity-methodof Pearson and Lipman [17]; the algorithm of Karlin and Altschul [18],modified as in Karlin and Altschul [19]. Computer implementations ofthese mathematical algorithms can be utilized for comparison ofsequences to determine sequence identity. Such implementations include,but are not limited to: CLUSTAL in the PC/Gene program (available fromIntelligenetics, Mountain View, Calif.); the ALIGN program (Version 2.0)and GAP, BESTFIT, BLAST, FASTA [17], and TFASTA in the WisconsinGenetics Software Package, Version 8 (available from Genetics ComputerGroup (GCG), 575 Science Drive, Madison, Wis., USA). Alignments usingthese programs can be performed using the default parameters.

Thus, a gvpA/B protein in a prokaryote of interest can be identified forexample by isolating GVs from a prokaryote of interest, isolating theprotein from the protein shell of the GV and obtaining the amino acidsequence of the isolated protein. In addition or in the alternative tothe isolating the GVs and isolating the protein, the method can includeobtaining amino acidic sequences of the shell proteins of the GV of theprokaryote of interest from available database. The method furthercomprises performing a sequence alignment of the obtained amino acidicsequences against the gvpA/B protein consensus sequence of SEQ ID NO:3.

In particular the isolating GVs from a prokaryote of interest can beperformed following methods to isolate gas vesicles as described in U.S.application Ser. No. 15/613,104, filed on Jun. 2, 2017. The isolatingthe protein for the protein shell of the GV and obtaining the relatedamino acidic sequence can be performed with tandem liquid chromatographymass-spectrometry alone or in combination with obtaining amino acidsequences of the isolated protein with wet lab techniques or fromavailable databases comprising the sequences of the prokaryote ofinterest as well as additional techniques and approaches identifiable bya skilled person. Obtaining amino acid sequences of GV shell proteins ofthe prokaryote of interest can be performed by screening availabledatabases of gene and protein sequences identifiable by a skilledperson. Performing a sequence alignment of the sequences of the isolatedGV proteins or proteins encoded in the genome of a prokaryote ofinterest can be performed (using Protein BLAST or other alignmentalgorithms known in the art) against the gvpA/B protein consensussequence of SEQ ID NO:3. In particular, a sequence alignment can beperformed using gvpA/B protein sequences from the closest phylogeneticrelative to the prokaryote of interest. Reference is made to Example 1showing exemplary phylogenetic relationships between gvpA/B proteins ofexemplary prokaryotic species.

The optional gvpC gene encodes for a gvpC protein which is a hydrophilicprotein of a GV shell, including repetitions of one repeat regionflanked by an N-terminal region and a C terminal region. The term“repeat region” or “repeat” as used herein with reference to a proteinrefers to the minimum sequence that is present within the protein inmultiple repetitions along the protein sequence without any gaps.Accordingly, in a gvpC multiple repetitions of a same repeat is flankedby an N-terminal region and a C-terminal region. In a same gvpC,repetitions of a same repeat in the gvpC protein can have differentlengths and different sequence identity one with respect to another.

Repeat regions within any given gvpC sequence ‘X’ from organism ‘Y’ canbe identified by comparing the related sequence with the sequence of aknown gvpC (herein e.g. reference gvpC sequence “Z”). In particular, thecomparing can be performed by aligning sequence ‘X’ to the referencegvpC sequence ‘Z’ using a sequence alignment tools such as BLASTP orother sequence alignment tools identifiable by a skilled person at thedate of filing of the application upon reading of the presentdisclosure. In particular, a reference sequence ‘Z’ is chosen from ahost that is the closest phylogenetic relative of ‘Y’, from a list ofAnabaena flos-aquae, Halobacterium salinarum, Haloferax mediditerranei,Microchaetae diplosiphon and Nostoc sp. The sequence alignment of ‘X’and ‘Z’ (e.g. a BLASTP) is performed by performing a first alignment ofsequence X and sequence Z to identify a beginning and an end of a repeatin ‘X as well as a number of repetition of the identified repeat, inaccordance with the known repeats in ‘Z’. The first alignment results inat least one first aligned portion of X with respect to referencesequence Z. The aligning can also comprises performing a secondalignment between the at least one first aligned portion of X identifiedfollowing the first alignment and additional portions of X to identifyat least one repeat ‘R1’ in X. Other repeats in ‘X’ (i.e. R2, R3, R4 . .. ) can subsequently be identified with respect to R1. In performingalignment steps sequence are identified as repeat when the sequenceshows at least 3 or more of the characteristics described in U.S.application Ser. No. 15/663,635 published as US 2018/0030501(incorporated herein by reference in its entirety) which also includeadditional features of gvpC proteins and the related identification.

In a GVGC, the GVS genes are comprised with Gas Vesicle Assembly genes.The Gas Vesicle Assembly genes are genes encoding for GVA proteins. GVAproteins comprise proteins with various putative functions such asnucleators and/or chaperons as well as proteins with an unknown specificfunction related to the assembly of the GV.

In a prokaryotic cell GVA genes are all the genes within one or moreoperons comprising at least one of a gvpN and a gvpF excluding anygvpA/B and gvpC gene possibly present within said one or more operons.Therefore GVA genes can be identified by identifying an operon in aprokaryote including at least one of a gvpN and a gvpF excluding anygvpA/B and gvpC gene.

Preferably the one or more operons comprising all the GVA genes of aprokaryote can be identified and detected by detecting a gvpN geneencoding for a GV protein consensus sequenceRALXYLQAGYXVHXRGPAGTGKTTLAMHLAXXLXRPVMLIXGDDEFXTSDLIGSESGYXXKKVVDNYIHSVVKVEDELRQNWVDNRLTXACREGFTLVYDEFNRSRPEXNNVLLS VLEEKILXLP(SEQ ID NO: 1) wherein X indicates any amino acid or a sequence of anylength having at least 50%, and more preferably 60% or higher, mostpreferably from 50% to 83% identity.

gvpN genes of various microorganisms have a sequence encoding for a gvpNprotein within the consensus SEQ ID NO: 1. In particular, gvpN gene inthe sense of the disclosure can be a gene encoding for sequenceMTVLTDKRKKGSGAFIQDDETKEVLSRALSYLKSGYSIHFTGPAGGGKTSLARALAKKRKRPVMLMHGNHELNNKDLIGDFTGYTSKKVIDQYVRSVYKKDEQVSENWQDGRLLEAVKNGYTLIYDEFTRSKPATNNIFLSILEEGVLPLYGVKMTDPFVRVHPDFRVIFTSNPAEYAGVYDTQDALLDRLITMFIDYKDIDRETAILTEKTDVEEDEARTIVTLVANVRNRSGDENSSGLSLRASLMIATLATQQDIPIDGSDEDFQTLCIDILHHPLTKCLDEENAKSKAEKIILEE CKNIDTEEK(SEQ ID NO: 11) or a sequence of any length having at least 30% sequenceidentity with respect to SEQ ID NO: 11, preferably at least 50%, andmore preferably 60% or higher,

and gvpF gene in the sense of the disclosure can be a gene encoding forsequence MSETNETGIYIFSAIQTDKDEEFGAVEVEGTKAETFLIRYKDAAMVAAEVPMKIYHPNRQNLLMHQNAVAAIMDKNDTVIPISFGNVFKSKEDVKVLLENLYPQFEKLFPAIKGKIEVGLKVIGKKEWLEKKVNENPELEKVSASVKGKSEAAGYYERIQLGGMAQKMFTSLQKEVKTDVFSPLEEAAEAAKANEPTGETMLLNASFLINREDEAKFDEKVNEAHENWKDKADFHYSGPWPAYNFVNIRLKVEEK (SEQ ID NO: 12) or a sequence of any length havingat least 20% sequence identity with respect to SEQ ID NO:12, preferablyat least 50%, more preferably 60%, and at least 70% or higher.

The term “operon” as described herein indicates a group of genesarranged in tandem in a prokaryotic genome as will be understood by askilled person. Operons typically encode proteins participating in acommon pathway are organized together as understood by those skilled inthe art. Typically, genes of an operon are transcribed together into asingle mRNA molecule referred to as polycistronic mRNA. PolycistronicmRNA comprises several open reading frames (ORFs), each of which istranslated into a polypeptide. These polypeptides usually have a relatedfunction and their coding sequence is grouped and regulated together ina regulatory region, containing a promoter and an operator. Typically,repressor proteins bound to the operator sequence can physicallyobstruct the RNA polymerase enzyme from binding the promoter, preventingtranscription. An example of a prokaryotic operon is the lac operon,which natively regulates transport and metabolism of lactose in E. coliand many other enteric bacteria.

In an operon, each ORF typically has its own ribosome binding site (RBS)so that ribosomes simultaneously translate ORFs on the same mRNA. Someoperons also exhibit translational coupling, where the translation ratesof multiple ORFs within an operon are linked. This can occur when theribosome remains attached at the end of an ORF and translocates along tothe next ORF without the need for a new RBS. Translational coupling isalso observed when translation of an ORF affects the accessibility ofthe next RBS through changes in RNA secondary structure.

In some embodiments, a GV cluster comprises one of gvpN or gvpF. Inseveral embodiments GV clusters include both gvpN and gvpF as will beunderstood by a skilled person. In this connection, reference is made toExample 12 and FIGS. 20 and 21 of related application U.S. applicationSer. No. 15/663,635 published as US 2018/0030501 incorporated herein byreference in its entirety, showing exemplary gas vesicle gene clustersoperons [1, 2] comprising GVS and GVA genes and related exemplaryconfiguration. In particular, as shown in Example 12 of relatedapplication U.S. application Ser. No. 15/663,635 published as US2018/0030501, typically a native GV gene cluster has GVA genescomprising both gvpN and gvpF genes, even if native GV gene clusters areknown having a gvpN gene or a gvpF gene, as understood by skilledpersons.

Accordingly, for a certain prokaryote, GVA genes in the sense of thedisclosure indicate all the genes that are comprised in the one or moreoperons having at least one of a gvpN and/or a gvpF herein described andexcluding any Gas Vesicle Structural (GVS) genes of the prokaryotespossibly comprised within the one or more operons.

Thus, GVA genes comprised in a gas vesicle gene cluster in a prokaryotecan be identified for example by obtaining genome sequence of theprokaryote of interest and performing a sequence alignment of theprotein sequences encoded in the genome of the prokaryote of interestagainst a gvpN protein sequence and/or a gvpF protein sequence.

In particular, obtaining the genome sequence of the prokaryote ofinterest, can be performed either using wet lab techniques identifiableby a skilled person upon reading of the present disclosure, or obtainedfrom databases of gene and protein sequences also identifiable by askilled person upon reading of the present disclosure. Performing asequence alignment of the protein sequences encoded in the genome of theprokaryote of interest can per performed using Protein BLAST or otheralignment algorithms identifiable by a skilled person. Exemplary gvpNprotein sequence and/or a gvpF protein sequence, that can be used inperforming the alignment are sequences SEQ ID NO: 11 and/or SEQ ID NO:12. In particular, a sequence alignment can be performed using gvpNand/or gvpF protein sequences from the closest phylogenetic relative tothe prokaryote of interest. Reference is made to Example 2 showingexemplary phylogenetic relationships between gvpF and gvpN proteins ofexemplary prokaryotic species. Accordingly, one or more operons thatcomprise the gvpN and/or gvpF genes can be identified, and any othergvps within the one or more operons can also be identified, wherein theother gvps are comprised in ORFs within the one or more operons,excluding any ORFs encoding gvpA/B or gvpC genes comprised in the one ormore operons of the GV gene cluster.

Accordingly, GVA genes can also be identified based on the configurationof operon and Gene Clusters identified through homology (see e.g.Example 1), phylogenesis (see e.g. Example 2) also using the gvpA/B,gvpN and/or gvpF consensus of SEQ ID Nos: 1, 3, and 11-12 hereinprovided, preferably gvpA/B consensus of SEQ ID NO: 3 and gvpN consensusof SEQ ID NO: 1. Reference is also made in this connection to theindication of Example 3 reporting exemplary GVGC configurations ofnaturally occurring Gas Vesicle gene clusters identified with methodherein described and additional methods identifiable by a skilledperson.

GVS genes of a GVGC of the disclosure, identified with methods hereinindicated, typically comprise gvpA or gvpB which have similar sequencesand are equivalent in their purpose and optionally gvpC. Exemplarysequences for gvpA and gvpB genes of GV gene clusters in the sense ofthe disclosure, which can also be used to identify additional GVS andGVGC through homology and alignment in addition to the use of theconsensus sequence SEQ ID NO: 3, are reported in Example 4.

GVA genes of a GVGC of the disclosure, identified with methods hereinindicated, typically comprise proteins identified as gvpF, gvpG, gvpL,gvpS, gvpK, gvpJ, and gvpU. GVA genes and proteins can also comprisegvpR and gvpT (see e.g. B. megaterium GVA) gvpV, gvpW (se Anaboena flosaque and Serratia GVA) and/or gyp X, gyp Y and gyp Z (see e.g. SerratiaiGVA. Preferably GVGC of the disclosure further comprise gvpN whichresult in a more robust detection with many detection methods hereindescribed. Exemplary sequences for GVA genes of GV gene clusters in thesense of the disclosure which can also be used to identify additionalGVAs and GVGC through homology and alignment are reported in Example 4.

In GVGC herein described co-expression of the GVS genes and the GVAgenes in connection with regulatory sequence capable of operating in ahost cell are configured to provide a GV type, with a different GVGCtypically resulting in a different GV type.

The wording “GV type” in the sense of the disclosure indicates a gasvesicle having dimensions and shape resulting in distinctive mechanical,acoustic, surface and/or magnetic properties as will be understood by askilled person upon reading of the present disclosure. In particular, askilled person will understand that different shapes and dimensions willresult in different properties in view of the indications in provided inU.S. application Ser. No. 15/613,104 published as US2018/0028693 andU.S. Ser. No. 15/663,600 published as US2018/0038922 and additionalindications identifiable by a skilled person Typically, larger volumeresults in stronger per-particle scattering, smaller diameter generallyresults in higher collapse pressure after removal of gvpC, and differentdimensions result in different ratios of T2/T2* relaxivity pervolume-averaged magnetic susceptibility ([20]).

Accordingly, in embodiments herein described, GVGC can be selected basedon desired properties of the corresponding GV type. In particular, tothis extent, a skilled person can use naturally occurring GVGC, canprovide engineered GVGC wherein some of the naturally occurring gvpgenes are omitted, and/or can provide hybrid GVGC in which GVAs and GVSgenes of naturally occurring GVGCs are combined to provide GV typeshaving the shape and dimensions resulting in the desired properties.

The term “hybrid gene cluster” or “hybrid cluster” as used hereinindicates a cluster comprising at least two genes native to differentspecies and resulting in a cluster not natively in any organisms.Typically, a hybrid gene cluster comprises a subset of gas vesicle genesnative to a first bacterial species and another subsets of gas vesiclegenes native to one or more bacterial species, with at least one of theone or more bacterial species different from the first bacterial specieAccordingly, a hybrid GV gene clusters includes a combination of GVgenes which is not native in any naturally occurring prokaryotes.

In particular, identification of a desired GVGC for a target cell andtherefore of the ability of the corresponding gvp genes combination toresult in production of functional GV proteins capable of assembling ina GV thus providing a corresponding detectable GV type can be performedthrough a testing method also directed to verify detectability of the GVby a detection method of choice. The testing method can be performed inthe target cell where detection of the GV type is desired or in testingcells having a cell environment equivalent to the cell environment ofthe target cell in terms of expression of GV genes and GV formation andthus provide a model to verify ability of the gvp genes to provide aGVGC for the target cells. In the method to identify a desired GVGC theintroducing can be performed using engineered polynucleotide constructscontacted with the target cell or testing cell for a time and underconditions to allow expression of the GVGC and formation of the GV type(e.g. using the methods described in U.S. application Ser. No.15/663,635 published as US 2018/0030501 incorporated herein byreference). The method further comprises detecting formation of a gasvesicle in the target cell or testing cell following the introducingwith a pre-set method of detection. Preset methods of detection can bedirected to detect acoustic and/or magnetic properties that are ofinterest in desired applications of the corresponding GV type.Preferably the testing can be performed in a target cell or testingcell, that have been modified, either chemically or genetically, to havethe same cellular turgor pressure as mammalian cells according tomethods identifiable by a skilled person.

Experiments performed with GVGC herein described provide proof ofprinciple that E. coli is an effective model for ability of a GVGC tocorrectly assemble in mammalian cell environment and that therefore canbe used as a testing cell GVGC capable of mammalian cells. Accordingly,detecting expression of a candidate GVGC in E. coli with a pre-setmethod is indicative of the ability of the corresponding GV proteins toform a GV type and of the GV type to correctly assemble and bedetectable with the pre-set method in a mammalian cell.

Experiments performed with GVGC herein described provide proof ofprinciple that E. coli is an effective model for ability of a GVGC tocorrectly assemble in mammalian cell environment and that detectingexpression of a candidate GVGC in E. coli with a pre-set method isindicative of the ability of the corresponding GV type to correctlyassemble and be detectable with the pre-set method in a mammalian cell.

In exemplary embodiments where a GV type is to be used in differentialultrasound imaging or image-subtracted ultrasound, the pre-set method ofdetection can comprise imaging with ultrasound a target site comprisingthe cell following the introduction of the GVGC, applying acousticpressure to the target site at a pressure expected to collapse the GVsand then imaging the target site with ultrasound again, and thedifference of the images (before and after collapse) shows if collapsingGVs (having a collapse threshold below the acoustic pressure) werepresent at the target site.

In exemplary embodiments where a GV type is to be used in MRI (magneticresonance imaging), imaging, the pre-set method of detection cancomprise imaging with MRI a target site comprising the cell followingthe introduction of the GVGC, applying hydrostatic pressure to thetarget site at a pressure expected to collapse the GVs. The target siteis then imaged with MRI again, and the difference of the images (beforeand after collapse) shows if collapsing GVs (having a collapse thresholdbelow the hydrostatic pressure) were present at the target site.

In exemplary embodiments where a GV type is to be used in BURST (burstultrasound reconstruction with signal templates) imaging describedherein and in U.S. application Ser. No. 16/736,581 filed on Jan. 7, 2020and herein incorporated by reference in its entirety, the pre-set methodof detection can comprise imaging with ultrasound a target sitecomprising the cell following the introduction of the GVGC, oversuccessive frames, at a peak positive pressure (PPP) well below theexpected collapse threshold pressure for the GVs. While the frames arebeing taken, increasing the PPP step-wise to a value over the expectedcollapse threshold pressure for at least 9 half-cycles. Frames frombefore, during, and after the application of the increased pressureundergo template mixing to detect a BURST signal from the collapsingGVs, if present.

Additional methods of detection such as Transmission Electron Microscopy(TEM) and optical scattering, optical phase detection, xenon hyperCESTMRI can be used.

An exemplary method of detection of a functional GVGC in the sense ofthe disclosure performed in E. coli is reported in Example 5 of thepresent disclosure. Additional methods to be performed other prokaryoticcells and/or mammalian cells using the GVES of the disclosure can beidentified by a skilled person upon reading of the present disclosure.

Several detectable GVGC with one or more detection method of interestshave been identified and can be used for production of GV types invarious cells through various genetically engineered constructs as willbe understood by a skilled person upon reading of the present disclosureand U.S. application Ser. No. 15/663,635 published as US 2018/0030501herein incorporated by reference in its entirety.

In some embodiments described herein GVGC of the instant disclosure canbe naturally occurring combination of gvp genes which can have anaturally occurring sequence or a sequence modified to optimize theexpression in the cell where detection is to be performed. For exampleGVGC clusters of the instant disclosure comprise a GVGC of B. megateriumformed by the gvpA or gvpB genes, gvpR, gvpN gvpF, gvpG, gvpL gvpS,gvpK, gvpJ, gvpT, gvpU of B. megaterium, or the GVGC of Anaboena FlosAquae formed by the gvpA or gvpB genes of Anaboena Flos Aquae (see e.g.the sequences in Table 6 of Example 4) and the GVA gvpC, gvpN, gvpJ,gvpK, gvpF, gvpG, gvpV, gvpW of Anaboena Flos Aquae (see e.g. sequencesin Table 10 of Example 4).

The gvp genes in one or more genes of the GVGC cluster of the presentdisclosure can have a naturally occurring sequence or a sequencemodified to optimize the expression in the cell where detection is to beperformed. For example a B. megaterium GVGC can have a gvpA or gvpBgenes having the sequences in Table 6 of Example 4, and/or any one ofthe gvpR, gvpN gvpF, gvpG, gvpL gvpS, gvpK, gvpJ, gvpT, gvpU geneshaving the sequences in Table 8 of Example 4. Similarly, an AnaboenaFlos Aquae GVGC can have the gvpA or gvpB genes having the sequencesreported in Table 6 of Example 4 and/or any one of the gvpC, gvpN, gvpJ,gvpK, gvpF, gvpG, gvpV, gvpW having the. sequences reported in Table 10of Example 4.

In some embodiments, described herein, GVGC of the instant disclosurecan be modified version of naturally occurring GV gene clusters. Anexample is provided by the. GVGC of B. megaterium comprising gvpB, gvpR,gvpN gvpF, gvpG, gvpL gvpS, gvpK, gvpJ, gvpT, gvpU wherein the gvpR andgvpT genes of the naturally occurring GVGC from B. megaterium have beenomitted (see e.g. the sequences reported in Example 6 and Table 9 of theinstant disclosure). Another example is provided by GV gene clusterscomprising gvpA, Ana-gvpC gvpN, gpvJ, gvpK, gvpF, gvpG, gvpW, and gvpVfrom Anabaena flos-aquae or GV gene clusters comprising gvpA+ gvpN,gpvJ, gvpK, gvpF, gvpG, gvpW, gvpV from Anabaena flos-aquae (seeAnabaena flos-aquae genes in Table 4 and Table 10 of Example 4 of thepresent disclosure).

In other embodiments described herein, GVGC of the instant disclosurecan be a hybrid GV gene cluster in a Gas Vesicle expression system ofthe disclosure, can comprise a combination of genes from A. flos-aquae(herein also Ana-gvp) and genes from B. megaterium (herein alsoMega-gvp). In particular, in exemplary embodiments, the hybrid GV genecluster can comprise B. megaterium GVA assembly genes gvpR, gvpN, gvpF,gvpG, gvpL, gvpS, gvpK, gvpJ, gvpT and gvpU and further comprisestructural gvpA gene from Anabaena flos-aquae. In some of thoseembodiments, the hybrid GV gene cluster can comprise gvpA, gvpC fromAnabaena flos-aquae and GVA genes from B. megaterium possibly excludinggvpR and/or gvpT. In some of those embodiments, the hybrid GV genecluster can comprise Ana-gvpA and mega GVA genes possibly excluding gvpRand/or gvpT. In some embodiments GVGC of the instant disclosure caninclude gvpA, gvpC, gvpN from Anabaena flos-aquae and GVA genes from B.megaterium, as well as other combinations identifiable by a skilledperson upon reading of the present disclosure.

In some embodiments herein described, a GVGC comprising gvp genes A/B, Cand N (gvpA/B, gvpC, gvpN genes) from a same or different prokaryote.Preferably the GVGC comprises a gvpN gene as presence of gvpN proteinresults in an increased detectability of the related GV type.

For example, in one exemplary embodiment, all the gvp genes B, N, F, G,L, S, K, J and U are from B. megaterium. GVs from B. megaterium aretypically cone-tipped cylindrical structures with a diameter ofapproximately 73 nm and length of 100-600 nm, encoded by a cluster ofeleven or fourteen different genes, including the primary structuralprotein, gvpB, and several putative minor components and putativechaperones [21, 22] as would be understood by a person skilled in theart.

In some embodiments, some of the set of nine gvp genes can be fromBacillus megaterium and the rest genes are from Anabaena flos-aquae suchas the GVGC comprising Ana-A, Ana-C, Ana-N, mega: gvpF, gvpG, gvpL,gvpS, gvpK, gvpJ, gvpT and gvpU with/without gvpR and gvpT, andadditional examples identifiable by a skilled person upon reading of thepresent disclosure (see Example 4 and Example 5 of the presentdisclosure).

In embodiments herein described, the sequences of at least one gvp genecan be modified with respect to the natural occurring sequence toimprove the related expression (e.g. to be codon optimized) and/or theinclusion in the GVES of the disclosure (e.g. by modification of the N-and/or C-terminal portions to allow the use of linker or other elementsto be included in a cassette or construct of the disclosure).

In some embodiments, the GVGC can comprise Serratia gvp genes asSerratia GVs can express functional GV proteins in E. coli, as reportedin literature ([23] [24]).

GVES and related constructs have been herein provided based on thesurprising finding that a naturally occurring, or engineered GVGC (e.g.modified to remove or add gvp genes, to include one or more gvp geneswith a modified sequence, and/or to include gvp genes from differentprokaryotes to provide a hybrid cluster) which is functional in E. Colican be expressed in mammalian cells on an engineered polynucleotideconstruct specifically configured to allow expression in the mammaliancell of GV proteins encoded by the GVGC resulting in formation of acorresponding GV type in the mammalian cell.

The term “mammalian cell” refers to cells from a mammal tissuecomprising cell within a mammal host and cell isolated from and expandedin culture for use as therapeutic and research tools. Exemplarymammalian cells that can express GVES of the disclosure are primarycells (cells that are directly harvested from an animal and geneticallyengineered with GVs. Exemplary mammalian cell culture that can begenetically engineered with GV constructs described herein configured toallow expression of GVs comprise HEK 293T, CHO-K1 cells, HEK293, CHO-K1,N2A cells, HeLa, Jurkat, NIH3T3, and other identifiable by those skilledin the art.

In particular, in accordance with the disclosure it has beensurprisingly found that naturally occurring, modified and/or hybrid GVGCcan be expressed in a mammalian cells if expression of gvpA or gvpB geneis performed in a gene expression cassette separated from the one ormore gene expression cassettes used to express the remaining GV genes ofthe GV gene cluster to be expressed. Also, it has surprisingly be foundthat gvp genes of a GV gene cluster other than gvpA and/or B can beexpressed in a mammalian cells in a single gene expression cassetteproviding that each gvp gene is separated from another in the samecassette by a separation element encoding a separation peptide possiblyin combination with at least one booster cassettes to increaseexpression of bottleneck genes in the GVGC cluster.

The term “gene cassette” as used herein indicated a mobile geneticelement that contains at least one gene and a recombination site.Accordingly, a gene cassette can contain a single gene or multiple genespossibly organized in an operon structure A gene cassette can betransferred from one DNA sequence (usually on a vector) to another by‘cutting’ the fragment out using restriction enzymes or transposase,cripr, viral and/or recombinase enzymes and other nucleases and‘pasting’ it back into the new context or other molecular biology andcloning techniques (e.g. per, CRISPR, TALENs, ZFN). Gene cassettes canmove around within an organism's genome or be transferred to anotherorganism in the environment via horizontal gene transfer.

A “gene expression cassette” is a gene cassette comprising regulatorysequence to be expressed by a transfected cell. Followingtransformation, the expression cassette directs the cell's machinery tomake RNA and proteins. Some expression cassettes are designed formodular cloning of protein-encoding sequences so that the same cassettecan easily be altered to make different proteins. An expression cassetteis composed of one or more genes and the sequences controlling theirexpression. An expression cassette typically comprises at least threecomponents: a promoter sequence, an open reading frame, and a 3′untranslated region that, in eukaryotes, usually contains apolyadenylation site. An expression cassette can be formed bymanipulable fragment of DNA carrying, and capable of expressing, one ormore genes of interest optionally located between one or more sets ofrestriction sites Gene expression cassettes as used herein typicallycomprise further regulatory sequences additional to the prompter toregulated the expression of the gene or genes within the open readingframe herein also indicated as coding region of the cassette.

In particular, in embodiments of the GVES herein described, the geneexpression cassettes of the system comprise one or more gvp genes undercontrol of regulatory sequence capable of operating in the mammalianhost and are thus configured to provide a GV type in the mammalian cell.

The term “regulatory sequence” or “regulatory regions” as describedherein indicate a segment of a nucleic acid molecule which is capable ofincreasing or decreasing transcription or translation of a gene withinan organism either in vitro or in vivo. In particular, coding regions ofthe GV genes herein described comprise one or more protein codingregions which when transcribed and translated produce a polypeptide.Regulatory regions of a gene herein described comprise promoters,transcription factor binding sites, operators, activator binding sites,repressor binding sites, enhancers, protein-protein binding domains, RNAbinding domains, DNA binding domains, silencers, insulators andadditional regulatory regions that can alter gene expression in responseto developmental and/or external stimuli as will be recognized by aperson skilled in the art.

The term “operative connection” as used herein indicate an arrangementof elements in a combination enabling production of an appropriateeffect. With respect to genes and regulatory sequences an operativeconnection indicates a configuration of the genes with respect to theregulatory sequence allowing the regulatory sequences to directly orindirectly increase or decrease transcription or translation of thegenes.

Regulatory sequences used in gene expression cassettes herein describedidentified herein also as mammalian regulatory regions are configured tooperate in a mammalian cell.

Exemplary regulatory regions capable of operating in mammalian cellscomprise promoters, enhancers, silencers, terminators, regulators,operators, ribosome binding/entry sites, and riboswitches, among othersknown in the art. Regulatory regions capable of operating in a mammalianhost can be selected by a skilled person following selection of themammalian host of interest. Exemplary constitutive and induciblemammalian promoters and operators suitable for regulating expression ofGVs in a mammalian host comprise and others identifiable by thoseskilled in the art and described herein.

Mammalian regulatory regions comprised in a gene expression cassetteherein described, typically comprise a mammalian promoter, 5′UTRregions, 3′UTR regions, and a terminator as will be understood by askilled person.

A “mammalian promoter” in the sense of the disclosure suitable for geneexpression in a mammalian cell is a region of DNA that leads toinitiation of transcription of a particular gene. Exemplary aretypically located on a same strand and upstream on a DNA sequence(towards the 5′ region of the sense strand), adjacent to thetranscription start site of the genes whose transcription they initiate.In mammalian cells organisms, promoters typically comprise theeukaryotic TATA (SEQ ID NO:13) box. Promoters are located near thetranscription start sties of genes, upstream on the DNA. Promoters cantypically be about 100-1000 base pairs long. In particular promotersthat can be used in gene expression cassette herein described can be aconstitutive promoter or a conditional promoter.

The term “conditional promoter” refers to a promoter with activityregulatable or controlled by endogenous transcription factors orexogenous inputs such as chemical, or thermal inducers or opticalinduction. Examples of mammalian constitutive promoters includeinducible promoters based on exogenous agents such as TET(tetracycline-response elements, TET-ON/TET-OFF), Lac,dCas-transactivator, Zinc-finger-TF, TALENs-ZF Ga14-uas, synNotch andinducible promoters based on endogenous signals TNF-alpha, cFOS andothers identifiable to a skilled person.

The term “constitutive promoter” refers to an unregulated promoter thatallows for continual transcription of its associated genes. Exemplarymammalian constitutive promoters that can be used for expression inmammalian cell include CMV from human cytomegalovirus, EF1a from humanelongation factor 1 alpha, SV40 from the simian vacuolating virus 40,PGK1 from phosphoglycerate kinase gene, Ubc from human ubiquitin C gene,human beta actin, CAAG, SynI and others identifiable to those skilled inthe art.

The wording “5′UTR region” refers to the region upstream from theinitiation codon as will be understood by a person of ordinary skill inthe art and is therefore outside the coding region of the cassette. The5′UTR region can contain a Kozak sequence. The Kozak sequence usedherein refers to a nucleic acid motif that functions as the proteintranslation initiation site in most eukaryotic mRNA transcripts as willbe understood by a person skilled in the art. The Kozak sequence locatesapproximately 6 nucleotide sequence upstream of the ATG start codon.Exemplary Kozak sequence include GCCACCATG (SEQ ID NO: 475), TTCACCATG(SEQ ID NO: 476), (CCC)TTCACCATG (SEQ ID NO: 477) consensus sequenceXXX[A/G]XXATG (SEQ ID NO: 478) wherein X indicates any nucleotide, andadditional sequences identifiable by a skilled person.

The “3′UTR region” refers to an untranslated region that immediatelyfollows the translation termination codon and is therefore outside thecoding region of the cassette. 3′UTR region often contains regulatoryregions that post-transcriptionally influence gene expression.Regulatory regions within the 3′UTR can influence polyadenylation,translation efficiency, localization, and stability of the mRNA as willbe understood by a person skilled in the art. In some embodiments, the3′UTR contains silencer regions which are configured to bind torepressor proteins and inhibit the expression of the mRNA.

A “terminator” as used herein indicates a sequence-based element thatdefines the end of a transcriptional unit and initiates the process ofreleasing the synthesized mRNA. Exemplary mammalian terminators includepolyadenylation sites. A “polyadenylation site” indicates an elementtarget by the polyadenylation enzymes such as CPSF and typicallycomprises the sequence AAUAAA (SEQ ID NO: 14) on the RNA.Polyadenylation sites will result in cleavage of the construct 10-30nucleotides downstream the site, and addition of a poly(A) tail locatedat the end of 3′UTR as will be understood by a person skilled in theart. In gene expression cassette the poly(A) site can include SV40polyadenylation element, hGH poly(A) signal, and other poly(A) signalthat have the canonical AAUAAA (SEQ ID NO: 14) region as will beunderstood by a skilled person.

In some embodiments, a gene expression cassette can include additionalmammalian regulatory regions configured to increase or decrease theexpression of the GV coding regions of the cassette, as will also beunderstood by a skilled person.

Exemplary mammalian regulatory sequences increasing transcription of theoperatively linked gene comprise enhancers that can be located moredistally from the transcription start site compared to promoters, andeither upstream or downstream from the regulated genes, as understood bythose skilled in the art. Enhancers are typically short (50-1500 bp)regions of DNA that can be bound by transcriptional activators toincrease transcription of a particular gene. Typically, enhancers can belocated up to 1 Mbp away from the gene, upstream or downstream from thestart site. An exemplary additional mammalian regulatory regionsdirected to enhance the expression levels of the GV genes, includeWoodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE)placed downstream of the genes between GV gene and the poly(A) tail. TheWPRE and WPRE-like (e.g. RE of Hepatitis B virus (HPRE)) element isknown to increase transgene expression from a variety of viral vectors.

Exemplary mammalian regulatory sequences decreasing transcription of theoperatively linked gene comprise RNAi/miRNA/shRNA sites that can belocated upstream or downstream of the GV genes to control mRNAtranslation or degradation. For example, by binding to specific siteswithin the 3′UTR, miRNAs can decrease gene expression of various mRNAsby either inhibiting translation or directly causing degradation of thetranscript.

Additional mammalian regulatory sequences that can be included in a geneexpression cassette include post transcriptional regulatory sequencessuch as riboswitches typically present in eukaryotic untranslatedregions (UTRs) of encoded RNAs. These sequences are configured to switchbetween alternative secondary structures in the RNA depending on theconcentration of key metabolites. The secondary structures then eitherblock or reveal other regulatory sequence regions such as RNA bindingproteins. A further examples of additional post transcriptionalregulatory sequences regulatory sequences comprise aptazymes fusionscomposed of an aptamer domain and a self-cleaving ribozyme which can beused for conditional gene expression to control mRNA levels with smallmolecules (e.g. tetracycline).

In general, selection of promoter and other regulatory sequences to beincluded in expression polynucleotidic constructs comprised in GVES ofthe present disclosure can be performed by one or more of the following:detecting functionality of a promoter and/or additional regulatorysequence in the host cells, selecting promoters and/or additionalregulatory sequences known to be functional in the host cells; detectingthe strength of the promoters and/or additional regulatory sequences inconnection with protein production and/or selecting promoter and/oradditional regulatory sequences of known strength; and selectinginducible promoters and/or additional regulatory sequence to control GVexpression.

Mammalian regulatory sequences can be provided in any configurationwhich is directed to provide a desired expression of the GV protein inthe coding regions. For example, a gene expression cassette can an endof UTR with polyA site only, or can be with WPRE and polyA site, or itcan be with WPRE only. A combination of WPRE and polyA tail is expectedto result in highest expression (highest copy of translated protein).Additional configuration can be identified by a skilled person.

In embodiments of the GVES herein described GV genes other than gvpA/Bcan be provided in a single gene expression cassette in variouscombinations and in any order to the extent that when the cassettecomprises two or more gvp genes other than gvpA/B, the two or more gvpgenes are configured to have each GV gene linked to another by aseparation element.

A separation element used herein refers to an element that can be placedbetween two adjacent coding genes allowing for a separate transcriptionor translation of the two adjacent coding genes.

In some embodiments, a separation element can be an internal ribosomeentry site (“IRES”). An internal ribosome entry site (IRES) used hereinrefers to an element that allows for translation initiation in acap-independent manner. In some embodiments herein described, an IRESelement is placed between two coding genes to allow for initiation oftranslation from an internal region of the mRNA. It allows thecoordinated expression of two genes using the same promoter in a singlegene cassette as will be understood by a person skilled in the art.Thus, the genes separated by IRES can be expressed from a bicistronicmRNA without requiring either cleavage of a polyprotein or generation ofa monocistronic mRNA.

Internal ribosome entry sites are approximately 450 nucleotides inlength and are characterized by moderate conservation of primarysequence and strong conservation of secondary structure. The mostsignificant primary sequence feature of the IRES is a pyrimidine-richsite whose start is located approximately 25 nucleotides upstream of the3′ end of the IRES. Detailed information on IRES can be found inJackson, et al., Trends Biochem. Sci., vol. 15, No. 12, pp. 477-483,1990.

Examples of IRES known in the art include IRES obtainable frompicomavirus and IRES obtainable from viral or cellular mRNA sources suchas for example, immunoglobulin heavy-chain binding protein (BiP), thevascular endothelial growth factor (VEGF) (Huezetal. (1998) Mol. Cell.Biol. 18(11):6178-6190), the fibroblast growth factor 2 (FGF-2), andinsulin-like growth factor (IGFII), the translational initiation factoreIF4G and yeast transcription factors TFIID and HAP4, theencephelomycarditis virus (EMCV) which is commercially available fromNovagen (Duke et al. (1992) J. Virol 66(3):1602-9) and the VEGFIRES(Huez et al. (1998) Mol Cell Biol 18(11):6178-90). IRES have also beenreported in different viruses such as cardiovirus, rhinovirus,aphthovirus, HCV. Friend murine leukemia virus (FrMLV) and Moloneymurine leukemia virus (MoMLV). As used herein, IRES encompassesfunctional variations of IRES sequences as long as the variation is ableto promote direct internal ribosome entry to the initiation codon of acistron.

In some embodiments, a separation element is a post-translation cleavageelement comprising a cleavage site sequence. A post-translation cleavageelement is typically placed between two adjacent coding genes.

In some embodiments, the post-translation cleavage element comprises a2A element. The term “2A element” or “2A sequence” refers to apost-translational or co-translational processing cleavage sitesequence. The 2A sequence can be a DNA sequence or the peptideexpression produce of the DNA sequence. The latter is referred to as the2A peptide. The 2A peptides are known to function by making the ribosomeskip the synthesis of a peptide bond at the C-terminus of a 2A element,leading to separation between the end of the 2A sequence and the nextpeptide downstream. The cleavage occurs between the Glycine and Prolineresidues found on the C-terminus meaning the upstream cistron will havea few additional residues added to the end, while the downstream cistronwill start with the Proline. The 2A elements used herein are placedbetween two adjacent GV coding genes. Exemplary 2A peptides are listedin Table 1 below:

TABLE 1 Exemplary 2A peptide sequences P2A ATNFSLLKQAGDVEENPGP(SEQ ID NO: 15) T2A EGRGSLLTCGDVEENPGP (SEQ ID NO: 16) E2AQCTNYALLKLAGDVESNPGP (SEQ ID NO: 17) F2A VKQTLNFDLLKLAGDVESNPGP(SEQ ID NO: 18) BmCPV DVFRSNYDLLKLCGDIESNPGP (SEQ ID NO: 19) BmIFVTLTRAKIEDELIRAGIESNPGP (SEQ ID NO: 20)

In Table 1, the bold residues are the consensus residues among each typeof 2A element (P2A, T2A, E2A or F2A). In each 2A element of Table 1, thecleavage occurs between the last G/P. In some embodiments, a linkersequence such as GAPGSG linker (SEQ ID NO: 21) is placed between a GVcoding gene and the 2A sequence optionally using a linker, wherein anylinker sequences such as GSG, GSGSG (SEQ ID NO: 2), SGS, and otherlinkers identifiable by a skilled person can be used. For example, apolynucleotide construct can comprise from 5′ to 3′ GV gene 1-GAPGSG-2Asequence-GV gene 2.

In some embodiments, the post-translation cleavage element comprises acleavage recognition site that can be targeted and subsequently cleavedby protease enzymes. Exemplary protease enzymes include TEV, HCV NS3/5protease, HIV protease, CMV protease, and HSV protease.

The term “protease cleavage site” in the sense of the disclosureindicates target sites for proteolytic cleavage by enzymes such aspeptidases, proteases or proteolytic cleavage enzymes which breakpeptide bond between amino acids in proteins. The general nomenclatureof cleavage site positions of the substrate were formulated by Schechterand Berger, 1967 [25] and Schechter and Berger, 1968 [26] Accordingly,the cleavage site is designated between P1-P1′, incrementing thenumbering in the N-terminal direction of the cleaved peptide bond (P2,P3, P4, etc.). On the carboxyl side of the cleavage site the numberingis incremented in the same way (P1′, P2′, P3′ etc.).

Protease cleavage sites that can be inserted in engineeredmicrocompartment proteins of the disclosure comprise regions up to 25residues. In particular, protease cleavage sites are inserted in aconfiguration which makes them surface accessible. In some embodimentsprotease cleavage site are included in an unstructured segment or withinan alpha helical or beta sheet secondary structured segment. Exemplaryprotease cleavage sites that can be inserted in engineeredmicrocompartment proteins herein described comprise TEV proteasecleavage sites with sequence ENLYFQG, (SEQ ID NO: 25) which isunstructured and others identifiable by a skilled person upon reading ofthe present disclosure (see Table 2).

Recognition sequences and cleavage sites of exemplary proteases areshown in Table 2./forward slash (/) indicates where protease cleaves theprotein sequence.

TABLE 2 Recognition sequences and cleavage  sites of exemplary proteasesSEQ Sequence and ID Enzyme Name Cleavage NO Human Rhinovirus LEVLFQ/GP22 (HRV) 3C Protease Enterokinase DDDDK/ 23 Factor Xa IEGR/ 24Tobacco etch ENLYFQ/G 25 virus protease (TEV protease) Thrombin LVPR/GS26 NS3/4A DLEVVT/STWV 27 NS4A/4B DEMEEC/ASHL 28 NS4B/5A DCSTPC/SGSW 29NS5A/5B EDVVCC/SMSY 30 NS4A/4B DEMEEC/SQH 31

In some embodiments, the cleavage recognition site comprises a TEVprotease cleavable sequence that can be placed between two GV codinggenes when the TEV enzymes are co-expressed. The TEV peptide can becleaved to release the two GV proteins.

In some embodiments, the cleavage recognition site comprises arecognition sequence targeted by one or more non-structural protein NS3,NS4A, NS4B and NS5 sequence.

In some embodiments herein described, post-translation cleavage elementcomprises an intein or hedgehog family auto-processing domains orvariants therefore, inserted in an open reading frame between multiplecoding genes. The term “intein” refers to the protein equivalent of geneintrons which facilitate protein splicing. The intein element containsthe necessary components needed to catalyze protein slicing and oftencontains an endonuclease domain that participates in intein mobility(Perler, F. B., et al., Nucleic Acids Research 1994, 22, 1127).

The Hedgehog family auto-processing domains used herein comprise thehedgehog protein carboxy-terminal autocatalytic domain HhC. As a personskilled in the art will understand, the hedgehog (“Hh”) proteins arecomposed of two domains, an amino-terminal domain HhN, which has thebiological signal activity, and a carboxy-terminal autocatalytic domainHhC, a carboxy-terminal autocatalytic domain HhC which cleaves Hh intotwo parts in an intramolecular reaction and adds a cholesterol moiety tothe HhN. HhC has sequence similarity to the self-splicing inteins, theshared region is termed Hint. New classes of proteins containing theHint domain have been discovered in bacteria and eukaryotes.

As a person skilled in the art will understand, the sequences of theinserted auto-processing polypeptides or cleavage sites can bemanipulated to enhance the efficiency of expression of the separateproteins.

Accordingly, in some embodiments, the constructs encoding gvpN, gvpF,gvpG, gvpL, gvpS, gvpK, gvpJ, and gvpU genes are comprised in a singlepolynucleotide. For example, all of the gvpN, gvpF, gvpG, gvpL, gvpS,gvpK, gvpJ, and gvpU genes can be provided in one open reading frame,operatively connected and under regulatory control of the same promoter.In an exemplary embodiments, gvpN, gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ,and gvpU genes from B. megaterium are comprised in a singlepolycistronic construct (see e.g. construct of Example 8).

In some embodiments, the construct encoding gvpN, gvpF, gvpG, gvpL,gvpS, gvpK, gvpJ, and gvpU genes are comprised in more than onepolynucleotide. For example, a subset of the gvpN, gvpF, gvpG, gvpL,gvpS, gvpK, gvpJ, and gvpU genes are comprised in one cassette in whichthey are, operatively connected and under regulatory control of a firstpromoter, whereas another subset of the gvpN, gvpF, gvpG, gvpL, gvpS,gvpK, gvpJ, and gvpU genes are comprised in another construct,operatively connected and under regulatory control of a second promoter.Each construct can be polycistronic construct when comprising two ormore coding genes. For example, one subset of Ana gvpN, gvpJ, gvpK,gvpF, gvpG, gvpW, gvpV can be on a polynucleotide and another subset ofAna gvpN, gvpJ, gvpK, gvpF, gvpG, gvpW, gvpV can be on anotherconstruct, either as monocistronic constructs or as polycistronicconstructs as will be understood by a skilled person.

The term “polycistronic construct” as used herein refers to a constructcapable of simultaneously translating multiple genes from a singletranscript as will be understood by a person skilled in the art, withina single cassette or in different cassettes on the construct if thecassettes are separated by an internal ribosome entry site.

In some embodiments, the polycistronic construct can be a biocistronicconstruct which comprises two genes separated by an Internal RibosomeEntry Site (IRES) element which allows for initiation of translationfrom an internal region of the mRNA. Use of IRES allows for the upstreamprotein to remain pristine while the downstream protein gets a MATTpeptide addition to its N terminus. The second protein may be expressedat a lower level compared with the first protein since the ribosomeentry site is less efficient than the 5′cap/UTR as will be understood bya skilled persons.

In some embodiments, some of the gene of a GVGC are expressed at a lowerlevel compared to other gvp genes of the GV gene cluster when expressedunder a same promoter and regulatory regions (herein also indicated asbottleneck genes). in those embodiments, the stoichiometry of theexpression of the bottleneck genes can be increased to provide anoptimal functionality of the GVES in the mammalian cell.

In particular, in some of those embodiments, the polynucleotideconstruct herein described further comprises a booster construct toelevate the gene expression. For example, the booster construct cancontain gyp genes J, F, G, L and K connected with a separation elementsuch as the p2A elements to elevate the expression of these genes. Thebooster construct containing gyp genes J, F, G, L and K can be comprisedin one or more gene cassettes each operatively connected with regulatorysequences to enable the expression of the gyp genes J, F, G, L and K. Inthose embodiments when comprised in more than one operon, these genesare separated by a joint element such as the P2A element. In someembodiments, gvpJ and gvpK can also be used by themselves as boosters.gvpJ, F, G, L, K can also be on their own separate gene cassettes (e.g.on separate plasmids) and act as boosters.

In some embodiments the booster constructs can be comprised on one ormore gene cassettes, where the use of promoter strengths can tunestoichiometry of the translated proteins. Stronger promoters can be usedon the booster constructs while relatively weaker promoters can be usedfor the other cassette.

In some embodiments the booster constructs can be comprised on one ormore gene cassettes, where the stability of the transcript can tunestoichiometry of the translated proteins. Regulatory elements thatstabilize mRNAs (for example PolyA, WPRE) can be used on the boosterconstructs. For genes that need to be expressed at lower relativestoichiometries, these stability elements can be removed, or can beconditionally removed using siRNAs/shRNAs/aptazymes/cas9 and etc. Whilethe other GV cassette can include these mRNA stability elements.

In some embodiments the booster constructs can be comprised on one ormore gene cassettes, where the use of degradation tags can tunestoichiometry of the translated proteins. Degradation tags targetproteins for proteolysis, for example ubiquitin and library ofubiquitin-fusion degradation tags (UbR, UbP, UbW, UbH, UbI, UbK, UbQ,UbV, UbL, UbD, UbN, UbG, UbY, UbT, UbS, UbF, UbA, UbC, UbE, UbM, 3xUbVR,3xUbVV, 2xUbVR, 2xUbVV, UbAR, UbVV, UbVR, UbAV, 2xUbAR, 2xUbAV),auxin-inducible degraon (AID), D-element, the PEST sequence,unstructured initiation sites, or short sequences rich in acceptorlysines. Genes on the booster constructs will not have these degradationtags while relatively degradation tags can be used for the other genesthat need to be expressed to lower levels. This can be used incombination with promoters and transcript stability examples.

Some embodiments the booster constructs can be comprised on one or moregene cassettes, where the use of micro-ORFs upstream of a cassette (ORFencoding gv genes) can be used to reduce the expression of GV proteins.Micro-ORFs are short open reading frames placed up stream of the ORFencoding the protein(s) of interest and results in the suppression ofprotein expression. They include a kozak/start codon NNNATG, smallpeptide and stop codon (TGA, TAG, TAA), for exampleAAAATGGCCGCGCCCAGAGCGTAG, NNNATG(NNN)[TAG/TGA/TAA] (SEQ ID NO: 474)([27]). For genes that need to be expressed at lower relativestoichiometries, micro-ORFs can be placed upstream of their cassette toreduce the expression level of these GV proteins.

In some embodiments the booster constructs can be comprised on one ormore gene cassettes, where the use of different inducible promoters(chemically or otherwise) can tune stoichiometry of translated proteins.Different promoters that are inducible by different stimuli can be usedto drive expression of the booster construct and/or other cassettes. Ahigher amount of inducer can be used to increase the expression ofbooster constructs. For genes that need to be expressed at lowerrelative stoichiometries a relatively lower amount of inducer can beused.

In some embodiments the booster constructs can be comprised on one ormore gene cassettes, where the presence of enhancing introns can tunestoichiometry of the translated proteins. Intron-mediated enhancementcan be used on the booster constructs. For genes that need to beexpressed at lower relative stoichiometries, these introns can beomitted, while the other GV cassette can include these introns. ([28],[29])

In some embodiments the booster constructs can be comprised on one ormore gene cassettes, where the stoichiometry of the translated proteinscan be tuned by different modes of Ribosome entry. Translation of thebooster construct can be initiated via the stronger cap-dependent geneexpression mediated by the KOZAK sequence and genes that need to beexpressed at lower relative stoichiometries can be initiated viaInternal Ribosome Entry Site (IRES).

Accordingly, in some of these embodiments, the GVES of the disclosurethe polynucleotides comprises at least three cassettes possibly on threedifferent polynucleotides, wherein the first polynucleotide contains gasvesicle gene B, the second polynucleotide is the booster constructcontaining gas vesicle genes J, F, G, L and K connected with aseparation element, and at least a third polynucleotide contains the gasvesicle gene N, F, G, L, S, K, J, and U (gvpN, gvpF, gvpG, gvpL, gvpS,gvpK, gvpJ, and gvpU). These gas vesicle genes N, G, L, S, K, J, and Ucan be comprised in one or more synthetic operons each operativelyconnected with regulatory sequences to enable the expression of the gasvesicle genes N, F, G, L, S, K, J, and U. When comprised in more thanone gene expressions cassette, these genes are separated by a separationelement such as the P2A element. In embodiments here described the orderof gvp genes within one or more cassettes are not important to determinefunctionality of the system. The co-transfection of these at least threepolynucleotides is sufficient for robust expression of gas vesicles incells, herein referred to as mammalian acoustic reporter gene (mammalianARG) (see Examples 12 and 13). Additionally, this architecture can befurther consolidated by connecting the gas vesicle protein B gene to thepolycistronic construct using IRES. When this new architecture isco-transfected to cells with the booster plasmid, it robustly producesgas vesicles.

In some embodiments, the GVES can comprise one cassette that encodesgvpB, one cassette that encodes gvpN, gvpF, gvpG, gvpL, gvpS, gvpK,gvpJ, and gvpU, and a booster with gvpJ, F, G, L, K as a polycistroniccassette and/or as a plurality of monocistronic cassettes. The cassettescan be on separate polynucleotides or on one or more polynucleotides.For example the GVPB cassette can be comprised on a same polynucleotideconstruct together with the cassette comprising gvpN, gvpF, gvpG, gvpL,gvpS, gvpK, gvpJ, and gvpU, or on a same polynucleotide with one boostercassette comprising gvpJ, F, G, L (see e.g. construct of Example 8).

Additional embodiments with other GVGC clusters e.g. comprising gvpgenes from B. megaterium and/or genes form Anabaena flos-aquae as wellas additional clusters are identifiable by a skilled person upon readingof the present disclosure.

In embodiments herein described, the GVES comprising a GVGC in two ormore gene cassettes located on one or more polynucleotide constructherein described operatively connected to regulatory sequences can beintroduced to a mammalian host allowing expression of the GV constructsand producing of gas vesicles in the mammalian host.

In particular in some embodiments, the method comprises introducing intothe mammalian cell a genetically engineered Gas Vesicle expressionsystem (GVES) herein described for a time and under condition to allowexpression of the gvp genes in the mammalian cell.

In some embodiments, the method comprises introducing into a cell of themammalian host a genetically engineered Gas Vesicle expression system(GVES) herein described in which the gvp genes encode for proteins ofthe gas vesicle type, the introducing performed for a time and undercondition to allow expression of the gvp genes in the mammalian cell.

Expression of GV constructs in a mammalian cell can be performed bycloning one or more polynucleotides encoding naturally occurring GVproteins or homologs thereof that are required for production of GVs(comprising gvpB, gvpN, gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ, and gvpU andother proteins known to those skilled in the art and described herein)into one or more suitable constructs configured to express theheterologous GV proteins in the mammalian cell. Polynucleotides encodingGV protein genes can be cloned using commercially available reagentsfrom vendors such as Qiagen, Invitrogen, Applied Biosystems, Promega,New England BioLabs and others, following standard molecular biologymethods known in the art, such as those described herein. As would beunderstood by those skilled in the art, polynucleotides encoding GVprotein genes can be obtained from several different sources. Forexample, polynucleotides encoding GV proteins can be obtained byisolating genomic DNA or cDNA encoding GV proteins from microorganismswhose genomes encode GV proteins genes, and/or express GV proteins RNA.RNA can be isolated from a cell that expresses GV proteins genes, andcDNA produced by reverse transcription using standard techniques andcommercial kits. Genomic DNA can be purified from the cell, and cDNA orgenomic DNA encoding one or more GV proteins isolated, following methodsknown to those in the art. In addition or in the alternative,polynucleotides comprising one or more gas vesicle genes can besynthesized using oligonucleotide and polynucleotide synthetic methodsknown in the art. For example, if rare mammalian codons are identifiedfollowing purification of genomic DNA from the cell, rare mammaliancodons are preferably edited to improve expression in the target cell.PCR-based amplification of one or more GV protein genes can be performedusing appropriately designed primer pairs (e.g. using PrimerDesign orother programs known to those skilled in the art). PCR-basedamplification can be followed by ligation (e.g. using T4 DNA ligase) ofa polynucleotide encoding gas vesicle gene amplicon into an appropriateconstruct in a plasmid suitable for propagation in bacteria or archaea,such as transformation-competent E. coli DH5alpha or other competent Ecoli type, followed by growth of transformed cell cultures, purificationof the plasmid for confirmation of the cloned gene by DNA sequenceanalysis, among other methods known to those skilled in the art.Expression vectors can comprise plasmid DNA, viral vectors, or non-viralvectors, among others known to those skilled in the art, comprisingappropriate regulatory elements such as promoters, enhancers, andpost-transcriptional and post-translational regulatory sequences thatare compatible with the mammalian cell intended to heterologouslyexpress the GV, as would be understood by a skilled person. Inparticular, in embodiments described herein, expression vectors suitablefor regulating heterologous expression of GVs comprise those havingpromoters and other regulatory elements known to skilled persons thatare compatible with mammalian cells, including cell lines, primary cellscultured in vitro such as petri dishes or introduce the GV gene circuitsinside the animal to genetically engineer cells directly inside theanimal and described above. Promoters can be constitutively active orinducible (and chosen to be selectively expressed in different celltypes). Exemplary inducible expression systems comprisetetracycline-inducible expression as shown in Examples 13, and 18.

In particular, in some embodiments described herein, production of a GVgene sequences can be codon-optimized (for example to remove raremammalian codons) for expression in the mammalian cell type according tomethods identifiable by a skilled person. As would be understood bythose skilled in the art, the term “codon optimization” as used hereinrefers to the introduction of synonymous mutations into codons of aprotein-coding gene in order to improve protein expression in expressionsystems of a particular organism, such as human, in accordance with thecodon usage bias of that organism. The term “codon usage bias” refers todifferences in the frequency of occurrence of synonymous codons incoding DNA. The genetic codes of different organisms are often biasedtowards using one of the several codons that encode a given amino acidover others, and use the one codon with a greater frequency thanexpected by chance. Optimized codons in organisms reflect thecomposition of their respective genomic tRNA pool. The use of optimizedcodons can help to achieve faster translation rates and high accuracy(and ultimately higher recombinant protein yield).

In some embodiments, one or more statistical methods proposed and usedto analyze codon usage bias the field of bioinformatics andcomputational biology can be used for codon optimization in the sense ofthe disclosure. Methods such as the ‘frequency of optimal codons’ (Fop),the Relative Codon Adaptation (RCA) or the ‘Codon Adaptation Index’(CAI) are used to predict gene expression levels, while methods such asthe ‘effective number of codons’ (Nc) and Shannon entropy frominformation theory are used to measure codon usage evenness.Multivariate statistical methods, such as correspondence analysis andprincipal component analysis, are widely used to analyze variations incodon usage among genes. There are many computer programs to implementthe statistical analyses enumerated above, including CodonW, GCUA, INCA,and others identifiable by those skilled in the art. Several softwarepackages are available online for codon optimization of gene sequences,including those offered by companies such as GenScript, EnCorBiotechnology, Integrated DNA Technologies, ThermoFisher Scientific,among others known those skilled in the art. Those packages can be usedin providing GV proteins with codon usage ensuring optimized expressionin various prokaryotic cell systems as will be understood by a skilledperson. In particular, codon optimization in embodiments hereindescribed can be used primarily to remove or limit the use of rarecodons, or keep codon usage above ˜10%)

Mammalian cell used herein to include a GVES of the disclosure refers toa mammalian cell which can be transduced, infected, transfected ortransformed with a vector under certain culture conditions. The vectorcan be plasmid, a viral particle, or others identifiable to a personskilled in the art. The term mammalian cell refers to cells isolatedfrom an animal (mammal) tissue and expanded in culture for use astherapeutic and research tools.

In some embodiments, the transformed mammalian cells can comprise one ormore cells such as T-cells, hematopoietic stem cells, mesenchymal stemcells, neural precursor cells, macrophages, fibroblasts orcardiomyocytes and any cell where one can express reporter genes (e.g.Green fluorescent protein (GFP)).

In some embodiments, the transformed mammalian cells can be part of atissue in vivo or ex vivo.

In some embodiments, the transformed mammalian cells can be isolatedmammalian cells such as mammalian cell lines. Mammalian cell lines usedherein refer to human or non-human mammalian recombinant expressionsystems capable of producing post-translational modifications whichclosely resemble those in mammalian cells in vivo. Exemplary non-humanmammalian cell lines include CHO-K1, mouse myeloma cell lines such asNS0, SP2/0, rat myeloma cell lines such as YB2/0, baby hamster kidney(BHK), N2A cells, HeLa, Jurkat, NIH3T3, and others identifiable to aperson skilled in the art. Human mammalian cell lines are immortalizedcells propagated in vitro from primary explants of human tissue or bodyfluid. Exemplary human cell lines include HEK293 and its derivatives,HT-1080, PER.C6, Huh-7 as well as others identifiable to a personskilled in the art.

In some embodiments, the transformation can occur in an individual of amammalian species such as Homo sapiens or Mus musculus, for example,among others. In some embodiments, mammalian cells in the sense of thedisclosure comprise stem cells, progenitor cells, induced pluripotentstem cells, and others identifiable by a skilled person.

In some embodiments herein described, the GVES herein described can beintroduced in a mammalian cell to provide a reportable molecularcomponent (herein GVRMC) of a gas vesicle reporting (GVR) geneticcircuit in operative connection with other molecular components of thegenetic circuit to report occurrence of a biochemical event in themammalian cell.

The term “molecular component” as used in connection with the GVRgenetic circuits described herein indicates a chemical compound or astructure comprised of a plurality of chemical compounds comprised in acellular environment. Exemplary molecular components thus comprisepolynucleotides, such as ribonucleic acids or deoxyribonucleic acids,polypeptides, polysaccharides, lipids, amino acids, peptides, sugarsand/or other small or large molecules and/or polymers that can be foundin a cellular environment. In some embodiments described herein, amolecular component of a GVR genetic circuit is a GV type or a clusterthereof.

The term “genetic molecular component” as used herein indicates amolecular unit formed by a gene (possibly comprising or formed by acluster of genes), an RNA transcribed from the gene or a portion thereofand optionally a polypeptide or a protein translated from thetranscribed RNA. In genetic circuits herein described, the biochemicalreactions connecting the genetic molecular component to anothermolecular component of the circuit can involve any one of the gene, thetranscribed RNA and/or the polypeptide forming the molecular component.

A gene comprised in a genetic molecular component is a polynucleotidethat can be transcribed to provide an RNA and typically comprises codingregions as well as one or more regulatory sequence regions, which is asegment of a nucleic acid molecule which is capable of increasing ordecreasing transcription or translation of the gene within an organismeither in vitro or in vivo. In particular, coding regions of a geneherein described can comprise one or more protein coding regions whichwhen transcribed and translated produce a polypeptide, or if an RNA isthe final product only a functional RNA sequence that is not meant to betranslated. Regulatory regions of a gene herein described comprisepromoters, transcription factor binding sites, operators, activatorbinding sites, repressor binding sites, enhancers, protein-proteinbinding domains, RNA binding domains, DNA binding domains, silencers,insulators and additional regulatory regions that can alter geneexpression in response to stimuli as will be recognized by a personskilled in the art.

An RNA of a genetic molecular component comprises any RNA that can betranscribed from a gene, such as a messenger ribonucleic acid (mRNA),short interfering ribonucleic acid, or ribonucleic acid capable ofacting as a regulating factor in the cell. mRNA comprised in a geneticmolecular component comprises regions coding for the protein as well asregulatory regions. mRNA can have additional control elements encoded,such as riboregulator sequences or a protein binding aptamer sequenceplaced upstream of the gene so the protein blocks ribosomes andconditionally prevents translation. Other RNAs that serve regulatoryroles that can comprise the genetic molecular component includeriboswitches, aptamers (e.g. malachite green, Spinach), aptazymes, guideCRISPR RNAs, and other RNAs known to those skilled in the art.

A protein comprised in a molecular component can be proteins withactivating, inhibiting, binding, converting, or reporting functions.Proteins that have activating or inhibiting functions typically act onoperator sites encoded on DNA, but can also act on other molecularcomponents. Proteins that have binding functions typically act on otherproteins, but can also act on other molecular components. Proteins thathave converting functions typically act on small molecules, and convertsmall molecules from one small molecule to another by conducting achemical or enzymatic reaction. Proteins with converting functions canalso act on other molecular components. Proteins with reportingfunctions have the ability to be easily detectable by commonly useddetection methods (absorbance, fluorescence, for example), or otherwisecause a reaction on another molecular component that causes easydetection by a secondary assay (e.g. adjusts the level of a metabolitethat can then be assayed for). The activating, inhibiting binding,converting, or reporting functions of a protein typically form theinteractions between genetic components of a genetic circuit. Exemplaryproteins that can be comprised in a genetic molecular component comprisemonomeric proteins and multimeric proteins, proteins with tertiary orquaternary structure, proteins with linkers, proteins with non-naturalamino acids, proteins with different binding domains, and other proteinsknown to those skilled in the art.

The term “cellular molecular component” indicates a molecular componentnot encoded by a gene, or indicates a molecular component transcribedand/or translated by a gene but comprised in the circuit without thecorresponding gene. Exemplary cellular components comprisepolynucleotides, polypeptides, polysaccharides, small molecules andadditional chemical compounds that are present in a cellular environmentand are identifiable by a skilled person. Polysaccharides, smallmolecules, and additional chemical compounds can include, for example,NAD, FAD, ATP, GTP, CTP, TTP, AMP, GMP, ADP, GDP, Vitamin B1, B12,citric acid, glucose, pyruvate, 3-phosphoglyceric acid,phosphoenolpyruvate, amino acids, PEG-8000, FiColl 400, spermidine, DTT,b-mercaptoethanol maltose, maltodextrin, fructose, HEPES, Tris-Cl,acetic acid, aTc, IPTG, 3° C.12HSL, 3° C.6HSL, vanillin, malachitegreen, Spinach, succinate, tryptophan, and others known to those skilledin the art. Polynucleotides can include RNA regulatory factors (smallactivating RNA, small interfering RNA), or “junk” decoy DNA that eithersaturates DNA-binding enzymes (such as exonuclease) or contains operatorsites to sequester activator or repressor enzymes present in the system.Polypeptides can include those present in the genetic circuit but notproduced by genetic components in the circuit, or those added to affectthe molecular components of the circuit.

In embodiments of genetic circuits herein described, one or moremolecular components is a recombinant molecular component that can beprovided by genetic recombination (such as molecular cloning) and/orchemical synthesis to bring together molecules or related portions frommultiple sources, thus creating molecular components that would nototherwise be found in a single source.

In a GVRMC of the disclosure, at least one gene expression cassette ofthe gene expression cassettes of the GVES of the disclosure comprises agas vesicle reporting (GVR) target region configured to be activatedand/or inhibited by a molecular component of a genetic circuit.

These additional (GVR) target region can include genetic elements thatallow control over cellular behavior through various biochemicalprocesses including transcriptional control, translational control,post-translational control and other control processes identifiable to aperson skilled in the art.

In some embodiments, the transcriptional control elements can includeconstitutive promoters, repressor and/or activator sites, recombinationsites, inducible and/or tissue-specific promoters, or cell fateregulators. The translational control elements can include RNAi,Riboregulators, RNA secondary structural motifs included in the GVESmRNA, or Ribosome-binding sites. The post-translational control elementscan include elements controlling phosphorylation cascades, proteinreceptor design, protein degradation element, and localization signals.Examples of these regulatory regions and their functional purposes canbe found in published review articles such as Purnick et al. ([30]) (forexample Table 1 of Purnick) as will be understood by a person skilled inthe art.

In embodiments herein described, a genetic circuit comprises at leastone genetic molecular component or at least two genetic molecularcomponents, and possibly one or more cellular molecular components,connected one to another in accordance with a circuit design byactivating, inhibiting, binding or converting reactions to form a fullyconnected network of interacting components.

In embodiments of the GVR genetic circuits described herein, themolecular components are connected with one another according to acircuit design in which a molecular component is an input and anothermolecular component is an output. In particular, a genetic circuittypically has one or more input or start molecular component whichactivates, inhibits, binds and/or convert another molecular component,one or more output or end molecular component which are activated,inhibited, bound and/or converted by another molecular component, andintermediary molecular components each inhibiting, binding and/orconverting another molecular component and being activated, inhibited,bound and/or converted by another molecular component. In someembodiments of the genetic circuits herein described, the input is thebiochemical event and/or a trigger molecular component and the output isactivation of expression of a GV gene cluster and assembly of a GV typethrough binding reactions between gvps of the GV type. In otherembodiments of the genetic circuits herein described, the input is abiochemical event and/or a trigger molecular component and the output isan intracellular spatial translocation of the GV type, the intracellularspatial translocation occurring typically through one or more convertingand/or binding reactions as described herein. The output of GVR circuitherein described can be detected with ultrasound contrast, MRI SWI,light scattering and additional techniques to detect GV identifiable bya skilled person upon reading of the present disclosure.

The term “activating” as used herein in connection with a molecularcomponent of a genetic circuit refers to a reaction involving themolecular component which results in an increased presence of themolecular component in the cellular environment. For example, activationof a genetic molecular component indicates one or more reactionsinvolving the gene, RNA and/or protein of the genetic molecularcomponent resulting in an increased presence of the gene, RNA and/orprotein of the genetic molecular component (e.g. by increased expressionof the gene of the molecular component, and/or an increased translationof the RNA). An example of “activating” described herein comprises theinitiation of expression of a GV gene cluster under the control of thetetracycline-inducible promoter (using reverse tetracycline-controlledtransactivator) followed by the ultrasound response of mammalian ARGs(e.g., see Example 13, and 18).

Activation of a molecular component of a genetic circuit by anothermolecular component of the circuit can be performed by direct orindirect reaction of the molecular components. Examples of a directactivation of a genetic molecular component comprise in a circuit theproduction of an alternate sigma factor (molecular component of thecircuit) that drives the expression of a gene controlled by thealternate sigma factor promoter (other molecular component of thecircuit), or the production of a small ribonucleic acid (molecularcomponent of the circuit) that increases expression of ariboregulator-controlled RNA (molecular component of the circuit).Examples of indirect activation of a genetic molecular componentcomprise the production of a first protein that inhibits an intermediatetranscriptional repressor protein, wherein the intermediatetranscriptional repressor protein represses the production of a targetgene, such that the first protein indirectly activates expression of thetarget gene.

The term “inhibiting” as used herein in connection with a molecularcomponent of a genetic circuit refers to a reaction involving themolecular component of the genetic circuit and resulting in a decreasedpresence of the molecular component in the cellular environment. Forexample, inhibition of a genetic molecular component indicates one ormore reactions involving the gene, RNA and/or protein of the geneticmolecular component resulting in a decreased presence of the gene, RNAand/or protein (e.g. by decreased expression of the gene of themolecular component, and/or a decreased translation of the RNA).Inhibition of a cellular molecular component indicates one or morereactions resulting in a decreased production or increased conversion,sequestration or degradation of the cellular molecular components (e.g.a polysaccharide or a metabolite) in the cellular environment.

Inhibition can be performed in the genetic circuit by direct reaction ofa molecular component of the genetic circuit with another molecularcomponent of the circuit or indirectly by reaction of products of areaction of the molecular components of the genetic circuit with theanother molecular component of the circuit.

The term “binding” as used herein in connection with molecularcomponents of a genetic circuit refers to the connecting or uniting twoor more molecular components of the circuit by a bond, link, force ortie in order to keep two or more molecular components together, whichencompasses either direct or indirect binding where, for example, afirst molecular component is directly bound to a second molecularcomponent, or one or more intermediate molecules are disposed betweenthe first molecular component and the second molecular component anothermolecular component of the circuit. Exemplary bonds comprise covalentbond, ionic bond, van der Waals interactions and other bondsidentifiable by a skilled person.

In some embodiments, the binding can be direct, such as the productionof a polypeptide scaffold that directly binds to a scaffold-bindingelement of a protein. In other embodiments, the binding may be indirect,such as the co-localization of multiple protein elements on onescaffold. In some instances binding of a molecular component withanother molecular component can result in sequestering the molecularcomponent, thus providing a type of inhibition of said molecularcomponent. In some instances, binding of a molecular component withanother molecular component can change the activity or function of themolecular component, as in the case of allosteric interactions betweenproteins, thus providing a type of activation or inhibition of the boundcomponent.

The term “converting” as used herein in connection with a molecularcomponent of the circuit refers to the direct or indirect conversion ofthe molecular component into another molecular component. An example ofthis is the conversion of chemical X by protein A to chemical Y that isthen further converted by protein B to chemical Z.

In the GVR genetic circuits in the sense of the present disclosure, thegvp genes and related cassettes included with a GVES of the disclosureare introduced into a mammalian cell to provide a reportable molecularcomponent connected with other genetic or cellular molecular componentsaccording to a circuit design, wherein the GV type is expressed or theGV type is intracellularly spatially translocated when the GVGC geneticcircuit operates according to the circuit design in response to abiochemical event and/or to a trigger molecular component.

The term “reportable molecular component” as used herein indicates amolecular component capable of detection in one or more systems and/orenvironments. The terms “detect” or “detection” as used herein indicatesthe determination of the existence, presence or fact of a target in alimited portion of space, including but not limited to a sample, areaction mixture, a molecular complex and a substrate. The “detect” or“detection” as used herein can comprise determination of chemical and/orbiological properties of the target, comprising ability to interact, andin particular bind other compounds, ability to activate another compoundand additional properties identifiable by a skilled person upon readingof the present disclosure. The detection can be quantitative orqualitative. A detection is “quantitative” when it refers, relates to,or involves the measurement of quantity or amount of the target orsignal (also referred as quantitation), which includes but is notlimited to any analysis designed to determine the amounts or proportionsof the target or signal. A detection is “qualitative” when it refers,relates to, or involves identification of a quality or kind of thetarget or signal in terms of relative abundance to another target orsignal, which is not quantified. In particular, in embodiments hereindescribed detection of the reportable molecular component comprising aGV type is performed through contrast enhanced imaging techniques suchas ultrasound and MRI (and light scattering).

The term “biochemical event” as used herein refers to an activating,inhibiting, binding or converting reaction between two or more molecularcomponents within a prokaryotic cell.

Accordingly, in some embodiments, at least one genetic molecularcomponent of the GVR genetic circuit comprises a GVB cassette andadditional GVP cassettes of the GVES of the disclosure comprising genesgvpB gene gvpN, gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ, and gvpU genes, in agas vesicle (GV) gene cluster in which the GV genes are operativelyconnected to a promoter configured to be activated directly orindirectly by the biochemical event, and directly initiate expression ofa GV type.

In some embodiments herein described, a genetic molecular component ofthe GVR genetic circuit comprises a gas vesicle (GV) gene clustercomprising the GVB cassette and additional GVP cassettes of the GVES ofthe disclosure in which genes gvpB gene gvpN, gvpF, gvpG, gvpL, gvpS,gvpK, gvpJ, and gvpU genes are configured to be activated directly orindirectly by the biochemical event, and directly initiate expression ofa GV type through interactions with promoters as well as one or moreenhancers and/or other regulatory DNA elements comprised within the GVBand/or additional GVP cassettes, which are identifiable by those skilledin the art. As would be understood by those skilled in the art,promoters are DNA regulatory elements that are typically locatedadjacent to the transcription start sites of genes, or a cluster ofgenes, on the same strand and upstream on a DNA sequence (towards the 5′region of the sense strand), and for transcription to occur, the enzymethat synthesizes RNA, known as RNA polymerase, attaches to the promoter.Promoters contain DNA sequences identifiable by those skilled in theart, such as those that provide binding sites for RNA polymerase andalso for proteins that function as transcription regulatory factors thatcan either activate or repress gene transcription.

The term “transcription regulatory factor” or “transcription factor” asused herein refers to any type of factors that can function by acting ona regulatory DNA element such as a promoter or enhancer sequence. Thetranscription regulatory factors can be broadly classified into atranscription repression factor (also referred to as “repressor”) and atranscription activation factor (also referred to as “activator”). Thetranscription repression factor acts on a regulatory DNA element torepress the transcription of a gene, thereby reducing the expressionlevel of the gene. The transcription activation factor acts on aregulatory DNA element to promote the transcription of a gene, therebyincreasing the expression level of the gene.

In particular, a transcription regulatory factor has typically at leastone DNA-binding domain that can bind to a specific sequence of enhanceror promoter sequences. Some transcription factors bind to a DNA promotersequence near the transcription start site and help form thetranscription initiation complex. Other transcription factors bind toother regulatory sequences, such as enhancer sequences, and can eitherstimulate or repress transcription of the related gene.

Examples of specific transcription repression factors include KRAB,repressor domains of proteins Egr-1, Oct2A, Dr1, YY1, RE-1 silencingtranscription factor (REST), Retinoblastoma protein, and MeCP2, mSininteraction domain, TALE repressors), and other identifiable by askilled person, as well as homologues of known repression factors, thatfunction in both prokarayotic and eukarayotic systems. Examples oftranscription activation factors include (VP-16, VP-64, etc.) as well ashomologues of known activation factors, that function in eukaryoticsystems.

In some embodiments, one or more promoters operatively connected to oneor more GVGC genes comprised within the GVB cassette and additional GVPcassettes of the GVES of the disclosure can be configured to beactivated directly or indirectly by one or more biochemical events. Inparticular, in some embodiments, activation of expression of a GV genesintroduced in a mammalian cell, can be linked to another molecularcomponent in the GVR genetic circuit through activator or repressortranscription factors. In some embodiments, expression of thetranscription factors can be regulated by a promoter of interest (seeExamples section). In other embodiments, transcription factors can beregulated post-translationally through degradation or phosphorylation ofthe transcription factor.

Accordingly, the reportable genetic molecular component of the GVRgenetic circuit comprising the GVB cassette and additional GVP cassettesof the GVES of the disclosure in which genes gvpB gene gvpN, gvpF, gvpG,gvpL, gvpS, gvpK, gvpJ, and gvpU genes are operatively connected to apromoter configured to be activated directly or indirectly by thebiochemical event, and directly initiate expression of a GV type can inseveral embodiments comprise promoters and/or other DNA regulatoryelements having one or more sequences identifiable to those skilled inthe art that are configured to function as binding sites for any knowntranscription regulatory factor.

For example, in some embodiments GV genes expression in GVR circuit ofthe disclosure can be activated by promoters inducible by sugars (e.g.,L-arabinose, L-rhamnose, xylose and sucrose), antibiotics (e.g.,tetracycline), CRISPR-dCas9 (possibly in conjunction with conditionallyactive gRNAs), heat shock promoters, pH-dependent promoters, oxidationstress-dependent promoters, radiation-dependent promoters,metal-inducible promoters, inflammation factor-inducible promoters,signaling factor-inducible promoter and others identifiable by thoseskilled in the art. In other embodiments GV genes expression can beinduced by activation of constitutive promoters of varying strengthsthat are suitable for regulating expression in mammalian cells describedherein and identifiable by those skilled in the art.

In other embodiments, the GV gene or one or more of the regulatoryelements of GVR circuit of the disclosure, is surrounded byrecombination sites that are recognized by a recombinase, whoseexpression or activity is connected through the genetic circuit to abiochemical event in the bacterial cell. For example, a GV genesintroduced in the mammalian cell in reverse (3′-5′) orientation to itspromoter (in 5′-3′ orientation) can be flanked by recombination sitessurrounding the GV genes, with the recombination sites configured toallow inversion of the hybrid GV gene cluster upon expression oractivation of its respective recombinase, wherein upon recombination thehybrid GV gene is flipped into a 5′-3′ orientation to allow initiationof expression by the promoter. Suitable recombination systems for use inmammalian cells are identifiable by those skilled in the art, such asthe piggy-bac integrase system, phiC31 and Bxb1 integrases, and theFLP/FRT or Cre/lox recombination systems, and additional systemsidentifiable by a skilled person.

In embodiments described herein, a GV gene cluster introduced by theGVES of the disclosure comprised in one or more genetic molecularcomponents of the GVR genetic circuits described herein is configured tofunction as a set of reporter genes, which together encode proteinsrequired for the formation of a GV type, such that expression of the GVtype functions as a genetically-encoded reporter of the biochemicalevent in the mammalian cell comprising a GVR genetic circuit. Asdescribed herein, the reportable characteristics of the GV are such thatthe genetically-encoded GV can be used as a contrast agent, which, whenused together with one or more contrast-enhanced imaging techniquesdescribed herein, functions as a genetically-encoded reporter inprokaryotic cells that have been genetically engineered to comprise oneor more of the GVR genetic circuits described herein.

In particular, in exemplary embodiments described herein, all the GVgenes of the cluster (e.g. gvpF, gvpG, gvpJ, gvpL, gvpK, gvpS, and gvpUand gvpA) enable GV formation. Therefore, if expression any one of thesegenes is regulated according to the design of a GVR genetic circuit asdescribed herein then the expression of the GV type will be regulatedaccordingly.

In some embodiments, the GVR genetic circuits described herein cancomprise a plurality of genetic molecular components that function asBoolean logical operators in genetic circuit designs known to thoseskilled in the art, such as those described in [31, 32]. As would beunderstood by persons skilled in the art, Boolean logic is a branch ofalgebra in which the values of the variables are the truth values ‘true’and ‘false’, usually denoted by the digital logic terms ‘1’ and ‘0’respectively. In contrast with elementary algebra where the values ofthe variables are numbers, and the main operations are addition andmultiplication, the main operations of Boolean logic are the conjunction‘AND’, the disjunction ‘OR’, and the negation ‘NOT’. As understood bythose skilled in the art, it is thus a formalism for describing logicalrelations in the same way that ordinary algebra describes numericrelations.

Accordingly, the term “AND gate” refers to a digital logic gate thatbehaves according to the truth table shown in Table 3. A ‘true’ output(1) results only if both the inputs to the AND gate are ‘true’ (1). Ifneither or only one input to the AND gate is ‘true’ (1), a ‘false’ (0)output results. Therefore, the output is always 0 except when all theinputs are 1.

TABLE 3 ‘AND gate’ truth table: Input Output A B A AND B 0 0 0 0 1 0 1 00 1 1 1

In particular, the term “AND gate” as used herein refers to the logicalrelation between two genetic molecular components in a GVR geneticcircuit, wherein inputs ‘A’ and ‘B’ in Table 3 are two biochemicalevents, and the output ‘A AND B’ in Table 3 is the production of acertain GV type.

For example, in some embodiments of an “AND gate” comprised in a GVRgenetic circuit described herein, the GVR genetic circuit comprises aplurality of genetic molecular components wherein at least a firstgenetic molecular component comprises a first subset of genes from theGV gene cluster, and at least a second genetic molecular componentcomprises a second subset of genes from the GV gene cluster, whereintogether the GV proteins expressed from the first and second geneticmolecular components are configured to form a GV type. In theseembodiments, activation of both the first AND second genetic molecularcomponent is required for the output of the GV type in the geneticcircuit when the genetic circuit operates according to the design of thegenetic circuit. For example, the first and second genetic molecularcomponents can comprise promoters that are activated by two or morebiochemical events in the mammalian cell comprising the GVR geneticcircuit.

In exemplary embodiments, any of gvpN, gvpF, gvpG, gvpJ, gvpL, gvpK,gvpS, and gvpU and gvpA of a GV gene cluster formed by genes gvpB genegvpN, gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ, and gvpU genes within the GVBcassette and additional GVP cassettes of the GVES of the disclosure canbe split into at least a first and second genetic molecular componentcomprising at least a first and a second subset of these genes to forman AND gate.

In other embodiments of an “AND gate” comprised in a GVGC geneticcircuit, two or more regulatory elements operatively connected to a GVgene cluster comprised in a genetic molecular component of a GVGCgenetic circuit that is activated by biochemical events A AND B wouldresult in the output of the GV type in the GVGC genetic circuit. Forexample, the promoter requires binding of two transcriptional activatorsfor activation of the promoter. In Examples described herein (see theMethods section of the Examples), GV gene clusters of exemplary ARG1 andARG2 and A2C constructs is driven by the T7 promoter that has a lacoperator downstream of the promoter. The T7 RNA Polymerase is regulatedby the araBAD promoter (inducible by L-arabinose). The lac operator isrepressed by LacI (IPTG derepresses). Therefore only under conditionswherein both IPTG AND L-ara are present are GVs expressed.

The term “OR gate” refers to a digital logic gate that behaves accordingto the truth table shown in Table 4. A ‘true’ output (1) results ifeither of the inputs to the OR gate are ‘true’ (1).

TABLE 4 ‘OR gate’ truth table: Input Output A B A OR B 0 0 0 0 1 1 1 0 11 1 1

In particular, the term “OR gate” as used herein refers to the logicalrelation between two genetic molecular components in a GVGC geneticcircuit, wherein inputs ‘A’ and ‘B’ in Table 3 are two biochemicalevents, and the output ‘A OR B’ in Table 3 is the production of acertain GV type.

For example, in some embodiments of an “OR gate” comprised in a GVGCgenetic circuit described herein, a promoter operatively connected to aGV gene cluster comprised in a genetic molecular component of a GVGCgenetic circuit that is activated by biochemical events A OR B wouldresult in the output of the GV type in the GVGC genetic circuit. Forexample, the promoter is activated by binding of either of two differenttranscriptional activators.

In other embodiments, an OR gate can be achieved through the use of twoconsecutive promoters. In exemplary embodiments, both these promoterscan be located directly upstream of the GV gene cluster or they can beindependently located directly upstream of any one or more of gvpN,gvpF, gvpG, gvpJ, gvpL, gvpK, gvpS, or gvpU and gvpA genes.

In other embodiments, GV genes introduced in the mammalian cell with aGVES of the disclosure can be flanked by recombination sites that arerecognized by a recombinase, whose expression or activity is, in turn,activated in response to a biochemical event in the mammalian cell. Forexample, in these embodiments, one input signal can activate the GVgenes organized within a GV gene cluster while a constitutive promoteris positioned in the opposite direction of the gene cluster. The secondinput would drive a recombinase that flips the promoter so that GV genescan be expressed. Exemplary recombinase systems comprise the piggy-bacintegrase system, phiC31 and Bxb1 integrases, and the FLP/FRT or Cre/loxrecombination systems, and additional systems identifiable by a skilledperson.

The term “Negated AND gate” or “NOT gate” refers to a digital logic gatethat behaves according to the truth table shown in Table 5. A ‘true’output (1) results if either of the inputs to the OR gate are ‘true’(1).

TABLE 5 ‘Negated AND gate’ or “NOT gate” truth table: Input Output A B ANOT B 0 0 0 0 1 0 1 0 1 1 1 0

In particular, the term “Negated AND gate” or “NOT gate” as used hereinrefers to the logical relation between two genetic molecular componentsin a GVGC genetic circuit, wherein inputs ‘A’ and ‘B’ in Table 5 are twobiochemical events, and the output ‘A OR B’ in Table 5 is the productionof a certain GV type.

For example, in some embodiments of an “Negated AND gate” or a “NOTgate” comprised in a GVR genetic circuit described herein, the GVGCgenetic circuit comprises a plurality of genetic molecular componentswherein at least a first genetic molecular component comprises a GV genecluster, and at least a second genetic molecular component comprises anCRISPR/Cas9 complex configured to inhibit expression of a gvp genecomprised in the GV gene cluster, e.g. a gvpA. In these embodiments,activation of expression and the first genetic molecular component andabsence of activation (or repression) of the second genetic molecularcomponent are both required for the output of a GV type in the geneticcircuit when the genetic circuit operates according to the design of thegenetic circuit. For example, the first and second genetic molecularcomponents can comprise promoters that are activated or repressed by oneor more biochemical events in the mammalian cell comprising the GVGCgenetic circuit. In embodiments of the genetic circuits herein describedwherein the input is a biochemical event and the output is anintracellular spatial translocation of the GV type, the GV type is amolecular component of the genetic circuit and intracellular spatialtranslocation of the GV type can occur through one or more convertingand/or binding reactions involving the GV type as described herein.

In some embodiments, in the GVR genetic circuit herein described, anexpression of the GV type or an intracellular spatial translocation ofthe GV type occurs when the hybrid GVR genetic circuit operatesaccording to the circuit design in response to a trigger molecularcomponent within the target mammalian cell.

In some embodiments, the trigger molecular component is a molecularcomponent that is capable of being natively produced in the target hostin its naturally occurring form. In particular, the natively producedmolecular component can be a genetic molecular component or a cellularmolecular component.

Examples of natively produced genetic molecular component can be one ormore RNA or protein natively encoded in the genome of the naturallyoccurring form of the mammalian host and natively expressed by thetarget mammalian host. Examples of cellular molecular componentsnatively produced by the target host comprise metabolites of enzymaticreactions produced by enzymes that are natively expressed by the targetmammalian host in its naturally occurring form.

In these embodiments, the GVR genetic circuit comprises a GV type whenthe GVR genetic circuit operates according to a circuit design inresponse to the presence of the natively produced molecular component inthe target mammalian cell.

In particular, in these embodiments, expression of the GVR in themammalian host does not require the introduction into the host of anygenetic molecular components in addition to the genetic molecularcomponents comprising the GVGC. In these embodiments, the promoteroperatively connected to a hybrid GV gene cluster in the GVGC geneticmolecular component is configured to be activated in response tomolecular components capable of being natively produced by the host inits naturally occurring form, such as natively expressed transcriptionfactors. Genetic molecular components that can be activated by nativemolecular components include response elements (activating transcriptionfactor 4 response element, activator protein 1 response element,antioxidant response element, cAMP response element, enhancer bindingprotein response element, hypoxia response element, metal responseelement, NFAT response element, p53 response element, serum responseelement, Smad binding element, Xenobiotic response element); additionalare identifiable by those skilled in the art. Natively produced proteinsor RNAs natively encoded in the genome of a particular mammalian cellhosts comprise transcription factors (SP-1, AP-1, C/EBP, EGR1, HSF,ATF/CREB, GLI1, HIF, c-Myc, Oct-1, p53, NF-1, STAT1) and lncRNAs (B2,roX1, roX2, Xist); additional are identifiable by those skilled in theart. Metabolites produced in biochemical reactions produced in thenaturally occurring form of the mammalian host comprise cytokines suchas chemokines, interferons (IFNy), interleukins (IL-2, IL-10),lymphokines (CSF1, CSF2, CSF3), and tumor necrosis factors (TNFa), aswell as hormones (including endocrine, paracrine, autocrine, andintracrine hormones) and growth factors (BMP, EGF, ephrin, EPO, FGF);additional are identifiable by those skilled in the art.

Thus, in these embodiments, the target host mammalian cell is labeledwith expression of a GV type, wherein expression of the GV type occursin presence of the trigger molecular component that is capable of beingnatively produced in the target mammalian cell host in its naturallyoccurring form. In several embodiments described herein, one or more GVRgenetic circuits can be introduced into one or more mammalian cell hostsaccording to genetic engineering methods described herein and known tothose skilled in the art. Different cells expressing different GVs wouldbe possible. The methods to introduce the GVES and related GVRMC areidentifiable by a skilled person upon reading of the disclosure.

In other embodiments, the trigger molecular component is a heterologousmolecular component that is not capable of being natively produced inthe target mammalian host in its naturally occurring form. In theseembodiments, the GVGC genetic molecular component is not configured toexpress the GV type in presence of a molecular component that is capableof being natively produced in the target mammalianhost in its naturallyoccurring form, but is instead configured to express the GV type inpresence of one or more heterologous (non-natively produced) triggermolecular components e.g. by using cell type specific promoters,described above, and/or viral transduction which would be cell typespecific.

In these embodiments, the trigger molecular component can be one or moreheterologous molecular components comprising a heterologous geneticmolecular component and/or a heterologous cellular molecular component.

In some embodiments, the heterologous genetic molecular component cancomprise one or more protein- and/or RNA-encoding genes and/orregulatory elements such as promoters and/or enhancer elements that arenot native to the target mammalian genome. In some embodiments, theheterologous genetic molecular component can be introduced into thetarget prokaryotic host in addition to the one or more genetic molecularcomponents comprising the GVGC. The additional heterologous geneticmolecular component can be a constitutively expressed or an induciblegenetic molecular component.

In some embodiments, the heterologous cellular molecular component cancomprise a molecular component that is naturally present in theenvironment comprising the target prokaryotic cell, such as a metaboliteproduced by a mammalian host comprising the target prokaryotic hostcell, or it can be a molecular component that is not naturally presentin the environment comprising the target prokaryotic host cell, andintroduced into the prokaryotic host cell, such as a drug configured toactivate expression of the heterologous genetic component.

Accordingly, the GVR circuit of the disclosure comprise a first GVESreporting molecular component, which is a GVES genetic molecularcomponent comprising the GVB cassette and at least one second GVESreporting molecular component which is a GVES genetic molecularcomponent comprising the additional GVP cassettes of the GVES of thedisclosure. In GVR circuit of the disclosure the first GVES reportingmolecular component and the at least one second GVES reporting molecularcomponent are activated to trigger expression of GV genes gvpB genegvpN, gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ, and gvpU to provide the gasvesicle in the mammalian cell.

In some embodiments, the GVES genetic molecular component of a GVRcircuit in a mammalian host according to the present disclosurecomprises promoter and/or enhancer elements that are configured to beactivated in response to the presence of a heterologous molecularcomponent. In exemplary embodiments, the promoter is a constitutivepromoter such as CMV (e.g., see Example 12, Example 15, Example 24,Example 25). In other exemplary embodiments, the promoter is activatedby a heterologous transcription factor that is encoded in a heterologousgenetic molecular component introduced into the target mammalian host inaddition to the GVGC genetic molecular component; in exemplaryembodiments described herein, the GVGC genetic molecular componentcomprises a promoter controlled by heterologous transcription factors,for example, (tetracycline-dependent repressor fused to transactivationdomain (VP16 domain) as illustrated in Example 13 and 18, similarly LacIand LexA fusions to transactivators (e.g. VP16) and repressor domains(KRAB), ET-dependent macrolide-responsive promoter, dead-Cas9 fusion totransactivators and repressors, zinc-finger proteins fused totransactivators and repressors, transcription activator-like effectorsfused to transactivators and repressors).

In some embodiments, the GVGC genetic molecular component comprisesrecombination sites (e.g. piggy-bac recombination sites) surrounding oneor more gyp genes comprised in the GV gene cluster or one or moreregulatory elements (e.g. promoter) wherein the one or more gvp genes orregulatory elements are introduced into a mammalian host cell in anorientation that prevents expression of the encoded GV type, e.g., thepromoter is in reverse orientation relative to the GV gene cluster; inthese embodiments a heterologous genetic molecular component comprisingthe recombinase enzymes required for flipping the orientation of theelements flanked by the recombinase sites in the GVGC genetic molecularcomponent is also introduced into the prokaryotic host cell andexpression of the GV type occurs upon recombinase-mediated flipping ofthe flanked elements in the GVGC genetic molecular component into anorientation allowing initiation of expression of the GV type.

In these embodiments, the GVR genetic circuit comprises a GV type iswhen the GVR genetic circuit operates according to a circuit design inresponse to the presence of the one or more heterologous molecularcomponents in the target mammalian cell.

Thus, in these embodiments, the target mammalian host is labeled withexpression of a GV type, wherein expression of the GV type occurs inpresence of the heterologous trigger molecular component introduced intothe target mammalian host.

Accordingly, in some embodiments, a method to provide a geneticallyengineered mammalian cell comprising one or more GVR genetic circuits isdescribed. The method comprises genetically engineering a mammalian cellby introducing into the cell one or more GVR genetic circuits describedherein.

The mammalian cells described herein can be genetically engineered usingmethods known to those skilled in the art. For example, one or moregenetic molecular components of a GVR genetic circuit comprised invectors described herein can be introduced into mammalian cells usingtransformation techniques such as lenti-virus, adeno associated virus,adenovirus, baculovirus, nanoparticles that contain genome editingenzymes such as CRISPR, TALENs, ZFNs, transposase and others known tothose skilled in the art and described herein. In some embodiments, thegenetic molecular components of a GVR genetic circuit are introducedinto the mammalian cell to persist as a plasmid or integrate into thegenome, following methods known in the art and described herein.

In embodiments herein described, the GVES system and related geneticcircuits, cells, vectors, genetically engineered prokaryotic cells,compositions, methods and systems, in several embodiments can be usedtogether with contrast-enhanced imaging techniques to detect and reporta biological event the location of and/or biochemical events ingenetically engineered mammalian cells in an imaging target site.

The term “contrast enhanced imaging” or “imaging”, as used hereinindicates a visualization of a target site performed with the aid of acontrast agent present in the target site, wherein the contrast agent isconfigured to improve the visibility of structures or fluids by devicesprocess and techniques suitable to provide a visual representation of atarget site. Accordingly a contrast agent is a substance that enhancesthe contrast of structures or fluids within the target site, producing ahigher contrast image for evaluation. In particular, as used herein, theterm “contrast agent” refers to GVs expressed in prokaryotic cellscomprised in the target site, the GVs comprised in GVGC genetic circuitsin the mammalian cells when the GVGC genetic circuit operates accordingto a circuit design in response to a biochemical event, as describedherein.

The term “target site” as used herein indicates an environmentcomprising one or more targets intended as a combination of structuresand fluids to be contrasted, such as cells. In particular the term“target site” refers to biological environments such as cells, tissues,organs in vitro in vivo or ex vivo that contain at least one target. Atarget is a portion of the target site to be contrasted against thebackground (e.g. surrounding matter) of the target site. Accordingly, asused herein a target comprises one or more mammalian cells geneticallyengineered to comprise one or more GVGC genetic circuits as describedherein within any suitable environment in vitro, in vivo or ex vivo aswill be understood by a skilled person. Exemplary target sites includecollections of microorganisms in vitro as well as cells grown in an invitro culture, including, primary mammalian, cells, immortalized celllines, tumor cells, stem cells, and the like. Additional exemplarytarget sites include tissues and organs in an ex vivo culture andtissue, organs, or organ systems in a subject, for example, lungs,brain, kidney, liver, heart, the central nervous system, the peripheralnervous system, the gastrointestinal system, the circulatory system, theimmune system, the skeletal system, the sensory system, within a body ofan individual and additional environments identifiable by a skilledperson. The term “individual” or “subject” or “patient” as used hereinin the context of imaging includes a single plant, fungus or animal andin particular higher plants or animals and in particular vertebratessuch as mammals and more particularly human beings.

In some embodiments, imaging the target site comprising the mammalianhost can be performed by applying ultrasound to obtain an ultrasoundimage of the target site.

The term “ultrasound imaging” or “ultrasound scanning” or “sonography”as used herein indicate imaging performed with techniques based on theapplication of ultrasound. Ultrasound refers to sound with frequencieshigher than the audible limits of human beings, typically over 20 kHz.Ultrasound devices typically can range up to the gigahertz range offrequencies, with most medical ultrasound devices operating in the 1 to18 MHz range. The amplitude of the waves relates to the intensity of theultrasound, which in turn relates to the pressure created by theultrasound waves. Applying ultrasound can be accomplished, for example,by sending strong, short electrical pulses to a piezoelectric transducerdirected at the target. Ultrasound can be applied as a continuous wave,or as wave pulses as will be understood by a skilled person.

Accordingly, the wording “ultrasound imaging” as used herein refers inparticular to the use of high frequency sound waves, typically broadbandwaves in the megahertz range, to image structures in the body. The imagecan be up to 3D with ultrasound. In particular, ultrasound imagingtypically involves the use of a small transducer (probe) transmittinghigh-frequency sound waves to a target site and collecting the soundsthat bounce back from the target site to provide the collected sound toa computer using sound waves to create an image of the target site.Ultrasound imaging allows detection of the function of moving structuresin real-time. Ultrasound imaging works on the principle that differentstructures/fluids in the target site will attenuate and return sounddifferently depending on their composition. A contrast agent sometimesused with ultrasound imaging are microbubbles created by an agitatedsaline solution, which works due to the drop in density at the interfacebetween the gas in the bubbles and the surrounding fluid, which createsa strong ultrasound reflection. Ultrasound imaging can be performed withconventional ultrasound techniques and devices displaying 2D images aswell as three-dimensional (3-D) ultrasound that formats the sound wavedata into 3-D images. In addition to 3D ultrasound imaging, ultrasoundimaging also encompasses Doppler ultrasound imaging, which uses theDoppler Effect to measure and visualize movement, such as blood flowrates. Types of Doppler imaging includes continuous wave Doppler, wherea continuous sinusoidal wave is used; pulsed wave Doppler, which usespulsed waves transmitted at a constant repetition frequency, and colorflow imaging, which uses the phase shift between pulses to determinevelocity information which is given a false color (such as red=flowtowards viewer and blue=flow away from viewer) superimposed on agrey-scale anatomical image. Ultrasound imaging can use linear ornon-linear propagation depending on the signal level. Harmonic andharmonic transient ultrasound response imaging can be used for increasedaxial resolution, as harmonic waves are generated from non-lineardistortions of the acoustic signal as the ultrasound waves insonatetissues in the body. Other ultrasound techniques and devices suitable toimage a target site using ultrasound, such as non-linear ultrasoundimaging such as AM, PI, AMPI, would be understood by a skilled person.

Types of ultrasound imaging of biological target sites include abdominalultrasound, vascular ultrasound, obstetrical ultrasound,hysterosonography, pelvic ultrasound, renal ultrasound, thyroidultrasound, testicular ultrasound, and pediatric ultrasound as well asadditional ultrasound imaging as would be understood by a skilledperson.

Applying ultrasound refers to sending ultrasound-range acoustic energyto a target. The sound energy produced by the piezoelectric transducercan be focused by beamforming, through transducer shape, lensing, or useof control pulses. The soundwave formed is transmitted to the body, thenpartially reflected or scattered by structures within a body; largerstructures typically reflecting, and smaller structures typicallyscattering. The return sound energy reflected/scattered to thetransducer vibrates the transducer and turns the return sound energyinto electrical signals to be analyzed for imaging. The frequency andpressure of the input sound energy can be controlled and are selectedbased on the needs of the particular imaging task and, in some methodsdescribed herein, collapsing GVs. To create images, particularly 2D and3D imaging, scanning techniques can be used where the ultrasound energyis applied in lines or slices which are composited into an image.

In some embodiments, the ultrasound imaging herein described cancomprising collapsing a GV type expressed in the genetically engineeredmammalian cell by applying collapsing ultrasound to the target siteand/or imaging a GV type in the contrast agent by applying imagingultrasound to the target site.

In some embodiments, a method is described to provide imaging of one ormore biochemical events in a mammalian cell comprised in an imagingtarget site, the method comprising:

introducing into the mammalian cell a genetically engineered Gas Vesicleexpression system (GVES) herein described in which the gvp genes encodefor proteins of a Gas Vesicle (GV) type, wherein the GV type is areportable molecular component of a gas vesicle reporting (GVR) geneticcircuit, in which molecular components are connected one to another inaccordance with a circuit design by activating, inhibiting, binding orconverting reactions to form a fully connected network of interactingcomponents, wherein in the GVR genetic circuit an expression of the GVtype or an intracellular spatial translocation of the GV type occurswhen the GVR genetic circuit operates according to the circuit design inresponse to the biochemical event

the introducing performed for a time and under condition to allowexpression of the gvp genes and production of the GV type in themammalian cell when the GVR genetic circuit operates according to thecircuit design; and

imaging the target site comprising the mammalian host by applying animaging ultrasound to the target site at a peak positive pressure belowa collapse pressure of the GV type, increasing step-wise the peakpositive pressure to above the collapse pressure of the GV type, takingimage frames before, during, and after the step-wise increase, andperforming signal separation on the image frames to image the GV type

In some embodiments, a method is described to label a target mammalianhost, the method comprising:

introducing into the target mammalian host a genetically engineered GasVesicle expression system (GVES) herein described in which the gvp genesencode for proteins of a Gas Vesicle (GV) type, the introducingperformed for a time and under condition to allow expression of the gvpgenes and production of the GV type in the mammalian cell, wherein theGV type is a reportable molecular component of a gas vesicle reporting(GVR) genetic circuit, in which molecular components are connected oneto another in accordance with a circuit design by activating,inhibiting, binding or converting reactions to form a fully connectednetwork of interacting components, wherein in the GVR genetic circuit anexpression of the GV type or an intracellular spatial translocation ofthe GV type occurs when the GVR genetic circuit operates according tothe circuit design in response to a trigger molecular component withinthe target prokaryotic host;

In the method, the introducing is performed under conditions resultingin presence of the trigger molecular component in the target mammalianhost.In some embodiments, the method can further comprise imaging the targetsite comprising the target mammalian host, by imaging the target sitecomprising the mammalian host by applying an imaging ultrasound to thetarget site at a peak positive pressure below a collapse pressure of theGV type, increasing step-wise the peak positive pressure to above thecollapse pressure of the GV type, taking image frames before, during,and after the step-wise increase, and performing signal separation onthe image frames to image the GV type.

The ability of GVs to act as a contrast agent for both ultrasound allowsthem to act as an acoustomagnetic reporter, thus creating possibilitiesfor multimodal imaging. In some embodiments herein described, whencollapsing ultrasound is used in combination with MRI imaging,acoustically collapsing a GV type expressed in a mammalian cell canremotely in situ erase the GV type to enable a background-free magneticresonance imaging of a target site. The background-free magneticresonance imaging removes background noise posed by background contrastfrom endogenous sources [33, 34] by allowing GV types to be identifiedspecifically based on their acoustic responses.

Accordingly, in various embodiments herein described imaging of abiochemical event and/or labeling of a mammalian cell can be performedby multiplex imaging as will be understood by a skilled person uponreading of the present disclosure.

In methods herein described, administration of one or more geneticallyengineered mammalian cell types comprising one or more GVR geneticcircuits to a target site to be imaged, can be performed in any waysuitable to deliver the one or more mammalian cells comprising a GVRgenetic circuit to the target site to be imaged.

In some embodiments, in which the target site is the body of anindividual or a part thereof, the one or more genetically engineeredmammalian cell types comprising a GVR genetic circuit can beadministered to the target site locally or systemically.

The wording “local administration” or “topic administration” as usedherein indicates any route of administration by which the one or moregenetically engineered bacterial cell types comprising a GVR geneticcircuit is brought in contact with the body of the individual, so thatthe resulting location of the one or more genetically engineeredbacterial cell types comprising a GVR genetic circuit in the body istopic (limited to a specific tissue, organ or other body part where theimaging is desired). Exemplary local administration routes includeinjection into a particular tissue by a needle, gavage into thegastrointestinal tract, and spreading a solution containing the one ormore genetically engineered bacterial cell types comprising a GVRgenetic circuit on a skin surface.

The wording “systemic administration” as used herein indicates any routeof administration by which the one or more genetically engineeredbacterial cell types comprising a GVR genetic circuit is brought incontact with the body of the individual, so that the resulting locationof the one or more genetically engineered bacterial cell typescomprising a GVR genetic circuit in the body is systemic (not limited toa specific tissue, organ or other body part where the imaging isdesired). Systemic administration includes enteral and parenteraladministration. Enteral administration is a systemic route ofadministration where the substance is given via the digestive tract, andincludes but is not limited to oral administration, administration bygastric feeding tube, administration by duodenal feeding tube,gastrostomy, enteral nutrition, and rectal administration. Parenteraladministration is a systemic route of administration where the substanceis given by route other than the digestive tract and includes but is notlimited to intravenous administration, intra-arterial administration,intramuscular administration, subcutaneous administration, intradermal,administration, intraperitoneal administration, and intravesicalinfusion.

Accordingly, in some embodiments of methods herein described,administering the one or more genetically engineered mammalian celltypes comprising a GVR genetic circuit can be performed topically orsystemically by intradermal, intramuscular, intraperitoneal,intravenous, subcutaneous, intranasal, rectal, vaginal, and oral routes.In particular, the one or more genetically engineered mammalian celltypes comprising a GVR genetic circuit can be administered by infusionor bolus injection, and can optionally be administered together withother biologically active agents. In some embodiments of methods hereindescribed, administering the one or more genetically engineeredmammalian cell types comprising a GVR genetic circuit can be performedby injecting the one or more genetically engineered mammalian cell typescomprising a GVR genetic circuit such as in a body cavity or lumen. Uponexpression of one or more GV types in one or more genetically engineeredbacterial cell types comprised in the target site, the target site canbe contrast imaged.

Accordingly, in some embodiments, a vector comprising one or moregenetic molecular components of a GVR genetic circuit is described,wherein the vector is configured to introduce the one or more geneticmolecular components comprised in a GVR genetic circuit into a mammaliancell.

The term “vector” indicates a molecule configured to be used as avehicle to artificially carry foreign genetic material into a cell,where it can be replicated and/or expressed. An expression vector isconfigured to carry and express the material in a cell under appropriateconditions. In some embodiments, a suitable vector can comprise arecombinant plasmid, a recombinant non-viral vector, or a recombinantviral vector. Vectors described herein can comprise suitable promoters,enhancers, post-transcriptional and post-translational elements forexpression in mammalian that are identifiable by those skilled in theart. Vectors suitable for transduction of mammalian cells, are known tothose skilled in the art. Exemplary vectors for transformation of amammalian cell with genetic molecular components comprising GV geneclusters are described herein in the Examples.

Accordingly, in some embodiments herein described, a geneticallyengineered mammalian cell and in particular a genetically engineeredmammalian cell comprising one or more GVR genetic circuits is described.

In embodiments herein described, a composition is provided. Thecomposition comprises one or more genetic molecular components of a GVRgenetic circuit, vectors, or genetically engineered mammalian cellsdescribed herein together with a suitable vehicle.

The term “vehicle” as used herein indicates any of various media actingusually as solvents, carriers, binders or diluents for the one or moregenetic molecular components, vectors, or prokaryotic cells hereindescribed that are comprised in the composition as an active ingredient.In particular, the composition including the one or more geneticmolecular components, vectors, or prokaryotic cells herein described canbe used in one of the methods or systems herein described.

In some embodiments, the GVGC comprised in a genetic molecular componentof a GVR genetic circuit can be engineered (e.g. by modifying therelated gvp genes) to produce GV types with altered mechanical,acoustic, surface and targeting properties in order to achieve enhancedharmonic responses and multiplexed imaging to be better distinguishedfrom background tissues. In particular, in some embodiments, a GV can beengineered to tune the related acoustic properties. In particular theengineering can be performed by genetically engineering a GV having anacoustic collapse pressure aP₀ performed to obtain a variant GV with acritical collapse pressure aP₁ lower than the aP₀.

In particular, in order to tune the acoustic collapse properties of theGV, one changes the structural proteins of the GV shell. For example,selecting proteins that make the GV shell longer, rounder, thicker, etc.or that add proteins to the shell that make it structurally stronger.Changes in the shape, size, and durability of the GV shell change itsacoustic properties as will be understood by a skilled person.

Accordingly, in embodiments described herein, GVR genetic circuitscomprising genetically-encoded GV types can be used together withcontrast-enhanced imaging techniques such as ultrasound imaging and/orMRI to detect the location of and/or dynamic biochemical events inprokaryotic cells in an imaging target site, wherein the mammalian cellshave been genetically engineered to comprise one or more GVR geneticcircuits described herein. In some exemplary embodiments, this allowsmonitoring the activity of various natural and engineered signalingcircuits in mammalian cells.

In some exemplary embodiments described herein, imaging of engineeredmammalian cells expressing GV types in vivo allows imaging of engineeredmammalian cells in target sites. However, conventional reporters basedon fluorescent and luminescent proteins or radionuclide capture sufferfrom the poor penetration of light into tissue or the need to administerradioactive tracers [35-37]. In contrast to these techniques, ultrasoundand MRI are widely available, inexpensive, radiation-free technologiescapable of noninvasively imaging deep tissues [38]. For example, thespatial resolution of ultrasound is routinely on the order of 100 μm[39, 40] and can approach the single-micron level with recentlydeveloped super-resolution techniques [41]. With these performancecharacteristics and the ability to place signals within an anatomicalcontext, ultrasound is an ideal technique for imaging microbes in vivo.

As described herein, GVESs and related polynucleotide constructs, GVRgenetic circuits, vectors, genetically engineered mammalian cells,compositions, methods and systems can be used in several embodiments todetect biochemical events in mammalian cells In particular embodiments,the GVES and related genetic circuits, vectors, genetically engineeredmammalian cells, compositions, methods and systems described hereinenable cell imaging inside mammalian hosts.

In some embodiments described herein, GV type-expressing mammalian cellscan be visualized in vivo in settings relevant to cell tracking such asimmune cells, circulating tumor cells, stem cells, blood cells, ortracking of cellular parts around the body such as exosomes,differentiation of cells in stem cells and progenitor cells, geneticchanges to cells, and additional settings identifiable by a skilledperson. In exemplary embodiments described herein, expression of GVtypes can make mammalian cells visible to ultrasound at volumetricconcentrations below 0.5%, allowing dynamic imaging of gene expressionand other biochemical events, and allows the visualization in vivo, suchas in tumor xenografts as shown in the Examples.

In some embodiments described herein, engineered gas vesicle geneclusters are used as reporter genes for ultrasound, giving this widelyused noninvasive imaging modality the ability to visualize bacteriainside living animals with sub-100 μm resolution. In several embodimentsdescribed herein, transformation with GVES systems of the disclosureallow mammalian cells to be detected at concentrations above 3 mammaliancells per ultrasound voxel, making this technology relevant to a broadrange of studies, demonstrating the ability of GVGC-expressing mammaliancells to be detected within living animals at relevant concentrations.

In some embodiments, the GVs and variants thereof comprised in GVRgenetic circuits described herein can be used as a contrast agent in thecontrast-enhanced imaging methods herein described.

In particular, a combination of different GV types and/or variantsthereof comprised in GVR genetic circuits can be used as contrastagents, each expressed GV exhibiting a different acoustic collapseprofile with progressively decreased midpoint collapse pressure values.In some cases, the percentage difference between the midpoint collapsepressure values of any given two expressed GVs types is at least twentypercent.

As mentioned above, the GV gene cluster and related GVR circuit,molecular component, polynucleotidic constructs, vectors, cells andcompositions herein described can be provided as a part of systems toperform any of the above mentioned methods. The systems can be providedin the form of kits of parts. In a kit of parts, one or more the hybridGV gene cluster and related GVR circuit, molecular component,polynucleotidic constructs, vectors, cells and other reagents to performthe methods herein described are comprised in the kit independently. Thehybrid GV gene cluster and related GVR circuit, molecular component,polynucleotidic constructs, vectors, cells can be included in one ormore compositions, and each the hybrid GV gene cluster and related GVRcircuit, molecular component, polynucleotidic construct, vector and cellis in a composition together with a suitable vehicle.

In particular, the components of the kit can be provided, with suitableinstructions and other necessary reagents, in order to perform themethods here disclosed. The kit will normally contain the compositionsin separate containers. Instructions, for example written or audioinstructions, on paper or electronic support such as tapes or CD-ROMs,for carrying out the assay, will usually be included in the kit. The kitcan also contain, depending on the particular method used, otherpackaged reagents and materials (such as. wash buffers and the like).

The genetically engineered GVES, and related genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems herein described can be used in several embodiments to providemagnetic resonance imaging with enhanced contrast and molecularsensitivity at sub-nanomolar concentration.

The genetically engineered GVES, and related genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems herein described can be used in connection with variousapplications wherein contrast-enhanced imaging of a target site isdesired. For example, the genetically engineered GVES, and relatedgenetic circuits, vectors, genetically engineered mammalian cells,compositions, methods and systems herein described can be used forvisualization of mammalian cells as part or introduced into a mammalianhost, such as mammalian hosts, facilitating for example the study of themammalian microbiome and the development of diagnostic and therapeuticprokaryotic cellular agents, among other advantages identifiable by askilled person, in medical applications, as well diagnosticsapplications. Additional exemplary applications include uses of thegenetically engineered GVES, and related genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems herein described in several fields including basic biologyresearch, applied biology, bio-engineering, bio-energy, medicalresearch, medical diagnostics, therapeutics, and in additional fieldsidentifiable by a skilled person upon reading of the present disclosure.

Further details concerning the genetically engineered GVES, and relatedgenetic circuits, engineered mammalian cells and methods of the presentdisclosure will become more apparent hereinafter from the followingdetailed disclosure of examples by way of illustration only withreference to an experimental section.

EXAMPLES

The polynucleotide constructs, and related genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems herein disclosed are further illustrated in the followingexamples, which are provided by way of illustration and are not intendedto be limiting.

In particular, the following examples illustrate exemplary methods andprotocols for providing and using polynucleotide constructs, and relatedgenetic circuits, vectors, genetically engineered mammalian cells,compositions, methods and systems. A person skilled in the art willappreciate the applicability and the necessary modifications to adaptthe features described in detail in the present section, to additionalgenetically engineered GVES, and related genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems according to embodiments of the present disclosure.

The following materials and method were used in the exemplaryembodiments reported in this section.

Chemicals, Cell Lines and Synthesized DNA:

All chemicals were purchased form Sigma Aldrich unless otherwise noted.HEK293T and CHO-K1 cell lines were ordered from American Type CultureCollection (ATCC) and HEK293 tetON cells and CHO tetON cells werepurchased form Clontech (Takara Bio). Synthetic DNA was ordered fromTwist Bioscience.

Cloning

Monocistronic plasmids used for transient transfection of HEK293T cellsof gas vesicles genes used the pCMVSport backbone and codon optimizedgas vesicle genes were assembled in each plasmid using Gibson assembly.To test the effect of N- and C-terminal p2A modification each B.megaterium gas vesicle gene on the pNL29 plasmid (addgene 91696) wasindividually cloned. To test the N-terminal modification, the CCT codonwas mutagenized following the start codon. To test the C-terminalmodification, linker-p2A sequence(GGAGCGCCAGGTTCCGGG-GCTACTAACTTCAGCCTCCTTAAACAGGCCGGCGACGTGGAAGAGAATCCTGGC)(SEQ ID NO: 32) was mutagenized upstream of the stop codon for eachgene.

The PiggyBac transposon system (System Biosciences) was used togenomically integrate the ARG cassettes. To clone the ARG cassettes tothe PiggyBac transposon backbone, the plasmid was first restrictiondigested using SpeI/HpaI and the ARG cassettes were Gibson assembled tothe backbone. For tetracycline inducible expression, the CMV promoterupstream of the GV genes was replaced with TRE3G promoter.

Cell Culture, Transient Transfection and TEM Analysis

HEK 293T and CHO-K1 cells were cultured in DMEM with 10% FBS andpenicillin/streptomycin and seeded in a 6-well plate for transfectionexperiments. When the cells reached 70-80%, 2 μg of total DNA (plasmidsencoding gas vesicle genes) was transiently transfected into the cultureusing 2.58 μg polyethyleneimine per μg DNA for 12-18 hours. Cells wereallowed to express the recombinant proteins for 72 hours.

Cells expressing gas vesicles in 6-well plates were lysed with 400 μL ofSolulyse-M per well for one hour at 4° C. The lysate was thentransferred to 2 mL tubes, diluted with 1.2 mL of 10 mM HEPES buffer atpH 8.0 and centrifugated overnight at 300 g at 8° C. Then, 60 μL of thesupernatant transferred to a fresh tube to be analyzed usingtransmission electron microscopy (TEM).

From this top fraction, 2 μL of sample was added to Formvar/carbon 200mesh grids (Ted Pella) that were rendered hydrophilic by glowdischarging (Emitek K100X). The samples were then stained with 2% uranylacetate. The samples were imaged on a FEI Tecnai T12 transmissionelectron microscope equipped with a Gatan Ultrascan CCD.

Genomic Integration and FACS (and 96 Well Plate Monoclonal)

HEK293 tetON and CHO tetON cells were used for genomic integration ofthe mammalian ARG. The cells were cultured in a 6-well plate containing2 mL DMEM with 10% tetracycline-free FBS (Clonetech) andpenicillin/streptomycin. Cells were transfected with the PiggyBactransposon backbone containing the ARG genes and the PiggyBactransposase plasmid at a transposon:transposase molar ration of 2.5:1.Transfection was conducted using parameters mentioned above and thecells were allowed to incubate for 72 hours. Cells were induced with 1μg/mL of doxycycline 24 hours prior to FACS. To obtain a polyclonalARG-expressing cell population, the top 10% brightest fluorescentpositive cells were sorted. For monoclonal cell lines, 576 cells fromthe 10% brightest fluorescent positive cells population were sorted inindividual wells of 96-well plate and the surviving 30 cells wereanalyzed.

Control mCherry-only cells were constructed similar to ARG-expressingcells. PiggyBac transposon plasmid containing TRE3G promoter drivingmCherry were used to make a stable cell line. After genomic integrationusing PiggyBac, the top 10% brightest fluorescent positive ells weresort.

Gas Vesicle Yield Measurement, Size Distribution and Cell Sectioningwith TEM

TEM analysis of gas vesicle yield and size distribution analysis wasconducted by seeding cells in 6-well plates and inducing gas vesicleexpression using 1 μg/mL of doxycycline and 5 mM sodium butyrate for 72hours. The cells were lysed using Solulyse-M and buoyancy enriched at300 g at 8° C. overnight. The top fraction of the supernatant was fixedwith 2M urea before being added to Formvar/carbon grids. The TEM gridswere washed with water before staining with 2% uranyl acetate. Tocalculate gas vesicle yield per cell, the total number of gas vesiclesper sub-grid on the TEM grid was manually counted. Gas vesicle sidedistribution was quantified using FIJI.

To visualize gas vesicles inside cells, ARG-expressing cells were seededin 6-well plates and allowed to express gas vesicles for 72 hours. Thecells were fixed with 4% paraformaldehyde. Cell sectioning and electronmicroscopy was conducted by Oak Crest Institute of Science.

In Vitro Toxicity Assays

The viability of the ARG-expressing cells was determined using threedifferent assays involving cellular metabolic activity (Resazurinreduction, MTT assay), quantification of cellular ATP content(CellTiter-Glo, Promega), and dye exclusion (Trypan Blue, Caisson Labs).The measurements were all quantified as percent viability compared withcontrol cells that expressed mCherry only. For the MTT and CellTiter-Gloassays, cells were grown in 96-well plates and induced with 1 μg/mLdoxycycline and 5 mM sodium butyrate for 72 hours. They were thentreated with reagents according the manufacturers' protocols.Luminescence (CellTiter-Glo) and absorbance at 540 nM (MTT) was measuredusing a SpectraMax M5 spectrophotometer (Molecular Devices). For theTrypan Blue assay, the cells were first grown in 6-well plates andtreated with 1 μg/mL doxycycline and 5 mM sodium butyrate for 72 hours.They were then trypsinized and resuspended in media before being stained1:1 with Trypan Blue dye. Ten μL of the solution was loaded in adisposable hemocytometer (C-chip DHC S02, Incyto) and total cell countand blue-stained dead cells were quantified by bright field microscopy.

In Vitro Ultrasound Imaging

To create phantoms for in vitro ultrasound imaging, wells were castedwith molten 1% w/v agarose in PBS using a custom 3D-printed template.ARG-expressing and mCherry-only control cells were allowed to expressgas vesicles using the specified inducer concentrations and expressionduration. They were then trypsinized and counted via disposablehemocytometers in bright field microscopy. Next, cells were mixed at a1:1 ratio with 50° C. agarose and loaded into the wells beforesolidification. The volume of each well is 60 μl and contain 6×10⁶cells. The phantoms were submerged in PBS, and ultrasound images wereacquired using a Verasonics Vantage programmable ultrasound scanningsystem and L22-14v 128-element linear array transducer with a 0.10-mmpitch, an 8-mm elevation focus, a 1.5-mm elevation aperture, and acenter frequency of 18.5 MHz with 67%—6 dB bandwidth (Verasonics,Kirkland, Wash.). Each frame was formed from 89 focused beam ray lines,each with a 40-element aperture and 8 mm focus. A 3-half-cycle transmitwaveform at 17.9 MHz was applied to each active array element.

For each ray line, the AM code is implemented using one transmit withall elements in the aperture active followed by 2 transmits in which theodd- and then even-numbered elements are silenced. Each image contains acircular cross-section of a well with a 4 mm diameter and centerpositioned at a depth of 8 mm. In AM mode, signal was acquired at 0.9MPa (2V) for 10 frames and the acoustic pressure was increased to 4.3MPa (12V) to collect 46 frames. There after the acoustic pressure wasincreased to 8.3 MPa (25V) to ensure complete collapse of gas vesicles.Gas vesicle-specific signal was determined by subtracting the area underthe curve of the first sequence by the post-collapse imaging sequence.

Cytotoxicity Assay on Cells Exposed to Ultrasound

ARG-expressing and mCherry-only cells were cultured on custom madeMylar-bottom 24-well plates. Cells were cultured on fibronectin coatedMylar films until they reached 80% confluency and induced for gasvesicle expression for 3 days. The cells were then insonated from thebottom using a L22-14v 128-element linear array transducer (Verasonics).The transducer was mounted on a computer-controlled 3D translatablestage (Velmex). The bottom of the plates was acoustically coupled to thetransducer with water and positioned 8 mm away from the transducer face.The cells were exposed to 8.3 MPa of pressure and the transducer wastranslated at a rate of 3.8 mm/s. The plates were returned to theincubator to allowed to rest for 24 hours. Cytotoxocity was assayedusing Resazurin reduction (MTT) on cells exposed to ultrasound andcompared to non-insonated negative control cells.

3D Cell Culture and In Vitro Acoustic Recovery after Collapse

ARG-expressing and mCherry-only cells were mixed in Matrigel (Corning)containing 1 μg/mL of Doxycycline and 5 mM sodium butyrate. Thecell-laden hydrogels were placed in a 1% agarose base to prevent cellmigration out of the hydrogel and separate the cells away from bottom ofplates for imaging. Cells were cultured for total of 6 days and imagedevery 3 days from the top using a L22-14v 128-element linear arraytransducer (Verasonics). The transducer was wiped with 70% ethanol andimaging is conducted in a tissue culture hood to preserve sterility.After imaging, all cells were exposed to 8.3 MPa ultrasound to ensurecomplete collapse of all gas vesicles in the cells at a rate of 1-2mm/s. The culture media was changed daily and contained 1 μg/mL ofDoxycycline and 5 mM sodium butyrate.

In Vivo Expression of Gas Vesicles and Ultrasound Imaging

All in vivo experiments were performed on NOD SCID mouse (NOD.CD17Prkdc^(scid)/NCrCrl; Charles River), aged 10-15 weeks, under a protocolapproved by the Institutional Animal Care and Use of Committee of theCalifornia Institute of Technology. The lower half of mice were shavedto allow for fluorescence imaging and ultrasound coupling.ARG-expressing and mCherry-only cells were cultured in tetracycline-freemedia in T225 flasks and 10-12 million cells were trypsinized and mixedwith Matrigel (Corning) containing 5 mM sodium butyrate. TheARG-expressing cell and Matrigel mixture was injected subcutaneously inthe left flank of mice and mCherry-only cell and Matrigel mixture wasinjected subcutaneously in the right flank of mice. Starting from theday of tumor inoculation, mice we interperitoneally injected with 200 μlof saline containing 75 μg of Doxycycline and 25 mg of sodium butyratedaily.

Example 1: Identification of Gyp Genes and Protein Sequences ThroughAlignment

Gvp genes and related protein can be identified through alignment ofsequences in databases or identified through wet bench experiments withan approach and techniques identifiable by a skilled person.

Taking as gvpA/B as an example, the identification can be performedusing consensus sequence:SSSLAEVLDRILDKGXVIDAWARVSLVGIEILTIEARVVIASVDTYLR (SEQ ID NO: 3) whereinX can be any amino acid. LDRILD (SEQ ID NO: 4), RILDKGXVIDAWARVS (SEQ IDNO: 5) wherein X can be any amino acid, and/or DTYLR (SEQ ID NO: 6),and/or of exemplary gvpA and gvpB protein sequences already identified,as it will be understood by a skilled person.

FIG. 1 shows an exemplary Clustal omega alignment of amino acidsequences of selected exemplary gvpA and gvpB proteins.

The gvpA and gvpB proteins shown are from the following species: Sa_A2,Serratia sp. ATCC 39006 gvpA2; Sa_A3, Serratia sp. ATCC 39006 gvpA3;Sc_A2, Streptomyces coelicolor gvpA2; Sc_A1, Streptomyces coelicolorgvpA1; Fc_A, Frankia sp. gvpA; Bm_B1, B. megaterium gvpB1; Mb_A,Methanosarcina barkeri gvpA; Hv_A, Halorubrum vacuolatum gvpA; Hm_A,Haloferax mediterranei gvpA; Hs_A1, Halobacterium sp. NRC-1 gvpA1;Hs_A2, Halobacterium sp. NRC-1 gvpA2; Bm_A, B. megaterium gvpA; Bm_B2,B. megaterium gvpB2; Af_A, A. flos-aquae gvpA; Ma_A; Sa_A1, Serratia sp.ATCC 39006 gvpA1.

The bottom row of FIG. 1 indicated as “Consensus” shows an exemplaryconsensus sequence derived from alignment of the gvpA and gvpB aminoacid sequences shown.

Homology-based searching (e.g., BLAST alignment) of sequences ofproteins encoded in the genome of a prokaryotic organism compared to theexemplary consensus sequence shown in FIG. 1 can be used to identifygvpA and/or gvpB protein sequences in the prokaryotic organism.

Example 2: Identification Gyp Genes and Protein Sequences ThroughPhylogenesis

Gvp genes and related protein can be identified based on phylogeneticrelationships of sequences in databases or identified through wet benchexperiments with an approach and techniques identifiable by a skilledperson.

In particular, exemplary gvpA, gvpF and gvpN genes and proteins wereidentified phylogenetic relationships as shown below.

FIG. 2 shows exemplary phylogenetic relationships of the gvpA proteinsequences from the indicated prokaryotic species [1]. Table 6 listsexamples of GV protein sequences from a number of prokaryotic species.

Identification of a gvpA/B protein can be performed by comparing thesequence of an unknown protein in a prokaryotic cell with that of aknown gvpA sequence from the closest phylogenetic relative of theprokaryotic species, such as those indicated in the exemplaryphylogenetic tree diagram in FIG. 2. Alternatively, identification ofgvpA/B can be done through protein alignment algorithms (e.g. BLAST)with the gvpA/B consensus sequence provided in this document, where theprotein identity has 60% or higher to this sequence.

FIG. 3 shows exemplary phylogenetic relationships of the gvpF and gvpLprotein sequences from the indicated prokaryotic species [1]. In someembodiments described herein, the identification of a gvpF protein canbe performed by comparing the sequence of an unknown protein in aprokaryotic cell with that of a known gvpF sequence from the closestphylogenetic relative of the prokaryotic species, such as thoseindicated in the exemplary phylogenetic tree diagram in FIG. 3.

FIG. 4 shows exemplary phylogenetic relationships of the gvpN proteinsequences from the indicated prokaryotic species [1]. In someembodiments described herein, the identification of a gvpN protein canbe performed by comparing the sequence of an unknown protein in aprokaryotic cell with that of a known gvpN sequence from the closestphylogenetic relative of the prokaryotic species, such as thoseindicated in the exemplary phylogenetic tree diagram in FIG. 4.

The protein sequences provided in Table 6 can also be used with proteinalignment algorithms to identify gvps. Where the using BLAST or othertools, if the top 100 based on protein identity or 100 lowest E-valuesare identified as “gas vesicle protein” or “gvp” or “gas vesiclestructural protein”, the protein can be designated as a gas vesicleprotein.

Example 3: Identification of Gyp Genes and Proteins Through Analysis ofConfiguration Vesicle Gene Clusters in Prokaryotes

Identification of gvp genes and proteins can be performed also GVcluster configuration of gas vesicle gene clusters in prokaryotes whichcan be used to identify the specific genes forming a GV cluster in amicroorganism, in combination with use of consensus sequences, alignmentand/or phylogenetic analysis of GV clusters.

FIG. 5 shows diagrams illustrating the organization of exemplary gasvesicle gene clusters. Gas vesicle gene clusters from the indicatedorganisms are shown, with genes shown as block-shaped arrows, and genesof predicted similar function indicated in the same shade of grey. Thedirection of the transcription of genes within a gene cluster isindicated by the direction of the block-shaped arrows, and genes groupedtogether having block arrows pointed in the same direction are typicallyorganized in the same operon. The scale bar indicates 1 kb [1].

In addition, FIG. 6 shows diagrams illustrating organization ofexemplary gvp gene clusters, wherein each letter indicates a gvp gene,and an arrow beneath a group of letters indicates an operon, with thedirection of the arrow indicating the direction of transcription [2].

To identify gvp genes and gvp gene cluster, the following methodologycan be used:

1. Using the 60+% gvpA/B and/or 50%+ gvpN consensus sequences and/or gvpsequences provided in Table 6, identify gvp genes on the genome of theprokaryote.

2. For a gvp gene identified, test the next 10 protein coding sequenceson both side of the gene to determine if it is gvp gene. Using BLAST orother tools, if the top 100 based on protein identity or 100 lowestE-values are identified as “gas vesicle protein” or “gvp” or “gasvesicle structural protein”, the protein can be designated as a gasvesicle protein.

3. If the adjacent genes are labeled as gvp gene, continue testing thenext 10 protein coding sequences on both sides of the protein, movingaway from the labeled gvp genes. Use criterion 2 to continue identifyinggvp genes. If the adjacent 10 genes are not marked as gvp genes continueto next part.

4. The genes at the extreme ends will mark the edge of the gene clusterand all the genes inside are part of the gene cluster than can be testedfor heterologous expression gas vesicle in bacteria/mammalian cells. Insome cases, there can be one or more gene clusters encoding gvp genes,therefore all the gene clusters are tested during heterologousexpression.

In particular, the above methodology can be one way to identify gvp geneclusters in an unannotated or mis-annotated genome as will be understoodby a skilled person.

Example 4: Amino Acid Sequences of Exemplary GV Proteins Including GVSand GVA Proteins

Several gyp genes and related proteins have been identified and areavailable in accessible databases.

In particular, Tables 6-10 show amino acid sequences of exemplary GVS(gvpA/B or gvpC) and GVA proteins from several exemplary prokaryoticspecies. In particular, these exemplary amino acid sequences can be usedas reference amino acid sequences in some embodiments for homology-basedsearches for related GVS and GVA proteins.

TABLE 6Amino acid sequences of exemplary gvpA/B, gvpF, gvpF/L, gvpG, gvpJ, gvpK, gvpL, gvpN, gvpV, gvpW, gvpR, gvpS, gvpT, and gvpU proteins SEQ IDSpecies, protein; Amino acid sequence NO.: gvpA/B Ana-family-MAVEKTNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLAIEARXV  33 consensus_gvpAIASVETYLKYAEAVGLTXSAAVPAX Aphanizomenon-flos-MAVEKTNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLAIEARIVI  34 aquae_gvpAASVETYLKYAEAVGLTQSAAVPA* Aphanothece-MAVEKTNSSSSLGEVVDRILDKGVVVDLWVRVSLVGIELLAVEAR  35 halophytica_gvpAVVVASVETYLKYAEAVGLTSSAAVPAE* Anabaena-flos-MAVEKTNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLAIEARIVI  36 aquae_gvpAASVETYLKYAEAVGLTQSAAVPA* Ancylobacter-MAVEKINASSSLAEVVDRILDKGVVVDAWVRVSLVGIELLAVEAR  37 aquaticus_gvpAVVVAGVDTYLKYAEAVGLTASAQAA* Aquabacter-MAVEKINASSSLAEVVDRILDKGVVVDAWVRVSLVGIELLAVEAR  38 spiritensis_gvpAVVVAGVDTYLKYAEAVGLTAGAQAA* Arthrospira-sp-PCC-MAVEKVNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLSVEARV  39 8005_gvpAVIASVETYLKYAEAVGLTAQAAVPSV* Calothrix-sp-strain-MAVEKTNSSSSLAEVIDRILDKGIVVDAWVRVSLVGIELLAIEARIV  40 PCC-7601_gvpAIASVETYLKYAEAVGLTQSAAVPA* Dactylococcopsis-MAVEKTNSSSSLGEVVDRILDKGVVVDLWVRVSLVGIELLAVEAR  41 salina-PCC-VVIASVETYLKYAEAVGLTSSAAVPAE* 8305_gvpA1 Dolichospermum-MAVEKTNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLAIEARIVI  42 circinale-ASVETYLKYAEAVGLTQSAAVPA* AWQC131C_gvpA Dolichospermum-MAVEKTNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLAIEARIVI  43 lemmermannii_gvpAASVETYLKYAEAVGLTQSAAVPA Enhydrobacter-MAVEKMNASSSLAEVVDRILDKGIVIDAWVRVSLVGIELLAVEAR  44 aerosaccus_gvpA1VVVAGVDTYLKYAEAVGLTAGAEAA* Lyngbya-MAVEKVNSSSSLAEVVDRILDKGIVVDAWVRVSLVGIELLAIEAR  45 confervoides-VVIASVETYLKYAEAVGLTAQAAVPAS* BDU14195 l_gvpA Nostoc-punctiforme-MAVEKVNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLSIEARIVI  46 PCC-73102_gvpAASVETYLRYAEAVGLTSQAAVPSAA* Nostoc-sp-PCC-MAVEKTNSSSSLAEVIDRILDKGIVVDAWVRVSLVGIELLAIEARIV  47 7120_gvpAIASVETYLKYAEAVGLTQSAAMPA* Microchaete-MAVEKTNSSSSLAEVIDRILDKGIVVDAWVRVSLVGIELLAIEARIV  48 diplosiphon_gvpAIASVETYLKYAEAVGLTQSAAVPA* Microcystis-MAVEKTNSSSSLAEVIDRILDKGIVIDAWARVSLVGIELLAIEARVV  49 aeruginosa-NIES-IASVETYLKYAEAVGLTQSAAVPA* 843_gvpA1 Microcystis-MAVEKTNSSSSLAEVIDRILDKGIVIDAWARVSLVGIELLAIEARVV  50 aeruginosa-NIES-IASVETYLKYAEAVGLTQSAAVPA* 843_gvpA2 Microcystis-MAVEKTNSSSSLAEVIDRILDKGIVIDAWARVSLVGIELLAIEARVV  51 aeruginosa-NIES-IASVETYLKYAEAVGLTQSAAVPA* 843_gvpA3 Microcystis-flos-MAVEKTNSSSSLAEVIDRILDKGIVIDAWARVSLVGIELLAIEARVV  52 aquae-TF09_gvpAIASVETYLKYAEAVGLTQSAAVPA* Phormidium-tenue-MAVEKVNSSSSLAEVVDRILDKGIVIDAWVRVSLVGIELLAIEARV  53 NIES-30_gvpAVIASVDTYLKYAEAVGLTAQAAVPAA* Planktothrix-MAVEKVNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLSIEARIVI  54 agardhii_gvpAASVETYLKYAEAVGLTAQAAVPSV Planktothrix-MAVEKVNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLSIEARIVI  55 rubescens_gvpAASVETYLKYAEAVGLTAQAAVPSV* Pseudanabaena-MAVEKVNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLSIEARVV  56 galeata-PCC-IASVETYLKYAEAVGLTASAAVPAA 6901_gvpA Stella-MAVEKINASSSLAEVVDRILDKGVVVDAWVRVSLVGIELLAVEAR  57 vacuolata_gvpAVVVAGVDTYLKYAEAVGLTAGAQTA* Trichodesmium-MAVEKVNSSSSLAEVIDRILDKGVVVDAWIRLSLVGIELLTIEARIV  58 erythraeum-VASVETYLKYAEAVGLTTLAAAPGEAAA* IMS101_gvpA3 Trichodesmium-MAVEKVNSSSSLAEVIDRILDKGVVVDAWVRLSLVGIELLTIEARI  59 erythraeum-VIASVETYLKYAEAVGLTTLAAEPAA* IMS101_gvpA4 Tolypothrix-sp.-PCC-MAVEKTNSSSSLAEVIDRILDKGIVVDAWVRVSLVGIELLAIEARIV  60 7601_gvpA1IASVETYLKYAEAVGLTQSAAVPA* Tolypothrix-sp.-PCC-MAVEKTNSSSSLAEVIDRILDKGIVVDAWVRVSLVGIELLAIEARIV  61 7601_gvpA2IASVETYLKYAEAVGLTQSAAVPA* Halo-family-MAQPDSSSLAEVLDRVLDKGVVVDVWARXSLVGIEILTVEARVV  62 consensus_gvpAAASVDTFLHYAEEIAKIEQAELTAGAEA-XPAPEA Halobacterium-MAQPDSSGLAEVLDRVLDKGVVVDVWARVSLVGIEILTVEARVV  63 salinarum_gvpA1AASVDTFLHYAEEIAKIEQAELTAGAEAAPEA Halobacterium-MAQPDSSSLAEVLDRVLDKGVVVDVWARISLVGIEILTVEARVVA  64 salinarum_gvpA2ASVDTFLHYAEEIAKIEQAELTAGAEAPEPAPEA Halobacterium-MAQPDSSGLAEVLDRVLDKGVVVDVWARVSLVGIEILTVEARVV  65 salinarum-NRC-AASVDTFLHYAEEIAKIEQAELTAGAEAAPEA* l_gvpA1 Halobacterium-MAQPDSSSLAEVLDRVLDKGVVVDVWARISLVGIEILTVEARVVA  66 salinarum-NRC-ASVDTFLHYAEEIAKIEQAELTAGAEAPEPAPEA* l_gvpA2 Haloferax-MVQPDSSSLAEVLDRVLDKGVVVDVWARISLVGIEILTVEARVVA  67 mediterranei-ATCC-ASVDTFLHYAEEIAKIEQAELTAGAEAAPTPEA* 33500_gvpA Halogeometricum-MAQPDSSSLAEVLDRVLDKGVVVDVWARVSLVGIEILTVEARVV  68 borinquense-DSM-AASVDTFLHYAEEIAKIEQAELTATAEAAPTPEA* 11551_gvpA Halopenitus-persicus-MAQPDSSGLAEVLDRVLDKGVVVDVWARVSLVGIEILTVEARVV  69 strain-DC30_gvpAAASVDTFLHYAEEIAKIEQAELTAGAEAAPEA Haloquadratum-MAQPDSSSLAEVLDRVLDKGIVVDTFARISLVGIEILTVEARVVVA  70 walsbyi-C23_gvpASVDTFLHYAEEIAKIEQAELTAGAEA* Halorubrum-MAQPDSSSLAEVLDRVLDKGVVVDVYARLSLVGIEILTVEARVVA  71 vacuolatum-strain-ASVDTFLHYAEEIAKIEQAELTAGAEAAPTPEA* DSM-8800_gvpA Halopiger-MAQPQRRPDSSSLAEVLDRILDKGVVIDVWARISVVGIELLTIEAR  72 xanaduensis_gvpA1VVVASVDTFLHYAEEIAKIEQATAEGDLEELEELEVEPRPESSPQSA AE* Natrialba-magadii-MAQPQRRPDSSSLAEVLDRVLDKGVVIDIWARVSVVGIELLTVEA  73 ATCC-43099_gvpARVVVASVDTFLHYAEEIAKIEQATAEGDLEDLEELEVEPRPESSPKS ATE* Natrinema-MAQPQRRPDSSSLAEVLDRVLDKGVVIDVWARISVVGIELLTIEAR  74 pellirubrum-DSM-VVVASVDTFLHYAEEIAKIEQATAEGDLDELEELEVEPRPESSPKS 15624_gvpA1 AE*Natronobacterium- MAQPQRRPDSSSLAEVLDRILDKGVVIDVWARVSVVGIELLTIEAR  75gregoryi-SP2_gvpA1 VVVASVDTFLHYAEEIAKIEQATAEGDLEDLEELEVEPRPESSPQS ATE*Methanosaeta- MVTSTPDSSSLAEVLDRILDKGIVVDVWARVSLVGIEILTVEARVV  76thermophila_gvpA1 VASVDTFLHYSEEMAKIEQAAIAAAPSA* Methanosaeta-MVTSTPDSSSLAEVLDRILDKGIVVDVWARVSLVGIEILTVEARVV  77 thermophila_gvpA2VASVDTFLHYSEEMAKIEQAAIAAAPGVPA* Methanosarcina-MVSQSPDSSSLAEVLDRILDKGIVVDVWARVSLVGIEILAIEARVV  78 barkeri-3_gvpA1VASVDTFLHYAEEITKIEIAAKEEKPAIAA* Methanosarcina-MVSQSPDSSSLAEVLDRILDKGIVVDTWARVSLVGIEILAIEARVV  79 vacuolata_gvpA1VASVDTFLHYAEEITKIEIAAREEKPVIAA* Methanosarcina-MVSQSPDCSSLAEVLDRILDKGIVVDTWARVSLVGIEILAIEARVV  80 vacuolata_gvpA2VASVDTFLHYAEEITKIEIAAREEKPVIAA* Haladaptatus-MVQAEPNSSSLADVLDRILDKGVVIDVWARISVVGIEVLTVEARV  81 paucihalophilus-VVASVDTFLHYAKEMAKLERASSEDEIDFEQVEVASPEASTS* DX253_gvp A Mega-family-MSIQKSTXSSSLAEVIDRILDKGIVIDAFARVSXVGIEILTIEARVVIA  82 consensus_gvpASVDTWLRYAEAVGLL-D-VEE-GLP-RX- Bacillus-MSIQKSTDSSSLAEVIDRILDKGIVIDAFARVSLVGIEILTIEARVVIA  83 megaterium_gvpASVDTWLRYAEAVGLLTDKVEEEGLPGRTEERGAGLSF* Bacillus-MSIQKSTNSSSLAEVIDRILDKGIVIDAFARVSVVGIEILTIEARVVIA  84 megaterium_gvpBSVDTWLRYAEAVGLLRDDVEENGLPERSNSSEGQPRFSI* Serratia-family-MAKVQKSTDSSSLAEVVDRILDKGIVIDAWXKVSLVGIELLSIEAR  85 consensusVVIASVETYLKYAEAIGLTAXAAAPA* Burkholderia-sp-MAKVQKSTDSSSLAEVVDRILDKGIVIDVWAKVSLVGIELLSIEAR  86 Bp5365_gvpA1VVIASVETYLKYAEAIGLTATAAAPTA* Desulfobacterium-MAKVQKTTDSSSLAEVVDRILDKGIVVDAWAKISLVGIELISIEAR  87 vacuolatum-DSM-VVIASVETYLKYAEAIGLTAAAAAPA* 3385_gvpA Desulfomonile-tiedjei-MAKIAKSTDSSSLAEVVDRILDKGIVIDAWAKVSLVGIELLSVEAR  88 DSM-6799_gvpA1VVIASVETYLKYAEAIGLTASAAAPA* Isosphaera-pallida-MAKVTKSTDSSSLAEVVDRILDKGIVIDAFAKVSLVGIELLSVEAR  89 ATCC-43644_gvpA1VVIASVETYLKYAEAIGLTASAATPA* Lamprocystis-MAKVANSTDSSSLAEVVDRILDKGIVIDAWIKVSLVGIELLAIEARI  90 purpurea-DSM-VIASVETYLKYAEAIGLTAPAAAPA* 4197_gvpA1 Lamprocystis-MAKVANSTDSSSLAEVVDRILDKGIVIDAWLKVSLVGIELLAVEA  91 purpurea-DSM-RVVIASVETYLKYAEAIGLTAPAAAPA* 4197_gvpA2 Legionella-drancourtii-MAKVQKSTDSSSLAEVIDRILDKGIVIDVWAKVSLVGIELLSIEARV  92 LLAP12_gvpA1VIASVETYLKYAEAIGLTATASHPA* Psychromonas-MANVQKTTDSSGLAEVIDRILDKGIVIDAFVKVSLVGIELLSIEARV  93 Ingrahamii_gvpA1VIASVETYLKYAEAIGLTASAATPA* Psychromonas-MANVQKSTDSSGLAEVVDRILEKGIVIDAFVKVSLVGIELLSIEARV  94 Ingrahamii_gvpA4VIASVETYLKYAEAIGLTASAATPA* Serratia-39006_gvpA1MAKVQKSTDSSSLAEVVDRILDKGIVIDAWVKVSLVGIELLSIEAR  95VVIASVETYLKYAEAIGLTASAATPA* Thiocapsa-rosea-MAKVANSTDSSSLAEVVDRILDKGIVIDAWVKVSLVGIELLAIEAR  96 strain-DSM-235-VVIASVETYLKYAEAIGLTAPAAAPA* Ga0242571-11_gvpA1 Other gvpAsBradyrhizobium- MAIEKATASSSLAEVIDRILDKGVVIDAFVRVSLVGIELLSIELRAV  97oligotrophicum- VASVETWLKYAEAIGLVAQPMPA* S58_gvpA1 Desulfotomaculum-MAVKHSVASSSLVEVIDRILEKGIVIDAWARVSLVGIELLAIEARV  98 acetoxidans-DSM-VVASVDTFLKYAEAIGLTKFAAVPA* 771_gvpA1 Octadecabacter-MAVNKMNSSSSLAEVVDRILDKGVVIDAWVRVSLVGIELIAVEAR  99 antarcticus-VVIAGVDTYLKYAEAVGLTAEA* 307_gvpA1 Octadecabacter-MAVSKMNSSSSLAEVVDRILDKGVVIDAWVRVSLVGIELIAVEAR 100 arcticus-238_gvpA1VVIAGVDTYLKYAEAVGLTAEA* Pelodictyon-luteolum- VVDAWVRMSLVGIELLAIEARV 101DSM-273_gvpA1 VVASVETYLKYAEAIGLTAKAA* Pelodictyon-luteolum-MAVEKTIGSSSLVEVIDRILDKGVVVDAWVRVSLVGIELLAIEARV 102 DSM-273_gvpA2VVASVETYLKYAEAIGLTAKAA* Pelodictyon-MSVEKTIGSSSLVEVIDRILDKGVVVDAWVRVSLVGIELLAIEARV 103 phaeoclathratiforme_VVASVETYLKYAEAIGLTAKAA* gvpA1 Rhodobacter-MAIEKSLASASIAEVIDRVLDKGIVVDAFVRISLVGIELLAIELRAV 104 capsulatus-SB-VASVETWLKYAEAIGLTVDPQTP* 1003_gvpA1 Rhodobacter-MAIEKSVASASIAEVIDRILDKGVVIDAFVRVSLVGIELIAIEVRAVV 105 sphaeroides_gvpA1ASIETWLKYAEAVGLTVDPATT* gvpF Anabaena-flos-MSIPLYLYGIFPNTIPETLELEGLDKQPVHSQVVDEFCFLYSEARQE 106 aquae_gvpFKYLASRRNLLTHEKVLEQTMHAGFRVLLPLRFGLVVKDWETIMSQLINPHKDQLNQLFQKLAGKREVSIKIFWDAKAELQTMMESHQDLKQQRDNMEGKKLSMEEVIQIGQLIEINLLARKQAVIEVFSQELNPFAQEIVVSDPMTEEMIYNAAFLIPWESESEFSERVEVIDQKFGDRLRI RYNNFTAPYTFAQLDS*Ancylobacter MSATLSAPGTANVAVEATAAADGKYLYGIIEAPAPATFDVPAIGG 107aquaticus strain RGDVVHTIALGRLAAVVSNSPRIDYDNSRRNMLAHTKVLEAVMA UV5_gvpFRHTLLPVCFGTVGSDAEVIIEKILRERRDELAGLLGQMHGRMELGLKASWREEIIFEEVLAENPAIRKLRDALVGRSPDQSHYERIQLGERIGQALQRKRQDDEERILERVRPFVHKTRLNKLIGDRMVINAAFLVDAAVESRLDASIRAMDEEWGGRLAFKYVGPVPPYNFVTITIHW* Aphanizomenon flips-MNTGLYLYGIFPDPIPETVDLQGLDKQSVHSQVVDGFSFLYSDAC 108 aquae NIES-81_gvpFQEKYLASRRNLLTHEKVLEQAMHEGFHVLLPLRFGLVVKDWETIQKQLIEPYKEQLNELFQKLAGQREVSIKILWDSKSELQAMMESNQDLKQQRDNMEGKKLKMEEIIQIGQLIESNLAARKQTVIQEFFNNLHPLAKEIIESEPMTEEMIYNAAFLIPWETESVFSERVEAIDRKFGDRL RIRYNNFTAPYTFAQLAS*Aphanothece MAEGFYLYGIFPPPGPQTIAVQGLDKQPIFSHTVEGFTFLYSEAQQS 109halophytica (strain RYLASRRNLITHTKVLEEAMEQGFRTLLPLQFGLVVPDWESVSQDPCC 7418)_gvpF LLQHQSETLQLLFQRLEGKREVSLKIYWETDAELNALLEENPDLKARRDNLEGKNLSMDEVIQIGQALEQAMERRKQEVITRFEDALIPFAVETQENDVLTETMIYNTAFLIPWESEPEFGEAVETVDAEFAPRLKI RYNNFTPPYNFVELRE*Aquabacter spiritensis MMQTDTLAPAETVAEGKYLYCLIDAPAPDTFASPGIGGRGDVVHT 110strain DSM ITVGRLAAVVSDSPRIEYENSRRNMMAHTKVLEEVMARHTMLPV 9035_gvpFCFGTVATGPDPISGKILEGRRDELVGLLEQMRGRLELGLKATWREDVIFAEILQENPAIAKLRDSLVGRSPEKSHFERIRLGEMIGQAMERKRRDDEERILERVRPFVHKTKLNKPIGDRMILNAAVLVEAAREAGLDQAVRQMDAEWGARLSFKYVGPVPPYNFVTITIHW* Bacillus-MSETNETGIYIFSAIQTDKDEEFGAVEVEGTKAETFLIRYKDAAMV 111 megaterium_gvpFAAEVPMKIYHPNRQNLLMHQNAVAAIMDKNDTVIPISFGNVFKSKEDVKVLLENLYPQFEKLFPAIKGKIEVGLKVIGKKEWLEKKVNENPELEKVSASVKGKSEAAGYYERIQLGGMAQKMFTSLQKEVKTDVFSPLEEAAEAAKANEPTGETMLLNASFLINREDEAKFDEKVNEAHENWKDKADFHYSGPWPAYNFVNIRLKVEEK* BradyrhizobiumMSNQPIYVYGLIRAEDHQPLAVRAVGDSEQPVNIIGSGNVAALVST 112 oligotrophicumIDLPEIMPTRRHMLAHTKVLEAAMANGPVLPMRFGIIVPNPATLLR S58_gvpFVIGFRHQELRARLDEIDGRIEVALKASWDEQFMWRQLASEHPDLAVSGRTMMGRGEQQSYYDRIELGRAIGAALEERRTAARLQLLQTVTPFAVQVKELTPVDDAMFAHLALLVEKGAEPSLYQTVEALERSNDSGLKFRYVAPIPPYNFVAVTLDWEQHEQAPRR* BurkholderiaMNSRNGARYLYAVQHARDVPASLPAGIGGAAVRALTDGDVAAIV 113 thailandensis sp.SDTGLAKVRPERRHLLAHHTVIQSLAAAGTVLPVAFGTIATSEVAL Bp5365 strainRRMLRKHRNALAGELARLVDHVEMSVRLNWDVTDLFRHLIDVRP MSMB43_gvpFDLKAARDAMLALGSAVTRDDKIELGSRFERVLNEERARHAALVDEALDACCKEIRRDPPRHETEILHLTCLVRHAELGRFESGVAAASRELDDSLVLKYSGPCPPHHFVNLNMSL* Chlorobium luteolumMERDGKYIYCIIGADCECDFGPIGIGGRGDLVSTIGFEGISMVVSDH 114 DSM 273_gvpF1PLNRFVVDPDGILAHQRVIEAVMKEHESVIPVRFGTVAATPDEIRNLLDRRYGELSELLLRLRNKVEFNVTGRWHDMAAIYKEVERTHPEIKEQRARIESMRDGDGEALKQSLILDTGHQIEAALEVMKEEKFDAVASLFRKTAMASKMNRTTSPDMFMNAAFLIDRGREVEFDGIMEILGQKDADRCDYRYSGPLAIFNFVDLRILPEKWEL* Chlorobium luteolumMAHEAAEQDGLYIYGIINNSGELDFGPIGIGGREERVYAVIHNDIA 115 DSM 273_gvpF2AVVSRTVVKEFEPRRANMIAHQKVLEAVMVSHAVLPVRFSTVSPGHDDMKVEKILEEDYLRLKKLLVKMEGKKEMGLKVMANEEKVYESIITGYDNIRYLRDKLINLPPEKTHYQRVKIGELVAAALEKEVGTYKDAVLDALSPIAEEVKVNDSYGSMMVLNAAFLIRTAREEEFDRAVNALDDRYHDMMTFKYVGTLPPYNFVNISINIKGR* Chlorobium luteolumMNQSIYIYGIVNEPALAASFVETDPDIYAVASMGCSAIVENRPAIDL 116 DSM 273_gvpF3GELDRESLARMLLQHQQTLERLMESGMQLIPLKLGTFVSSAADAACIIEDGYNLIERIFRETEDAHELEVVVKWSSFADLLQEVVSEGDVQELKREVEARQSSSTEDAIAVGRLIKEKIDRRNAALSASVLRQLGERASQSKRHETMDDEMVLNAAFLVNRGDVDAFVATVEALDSQYLNALHFRIVGPLPCYSFYTLEVTALFEEFIAEKRAVLGLDARSCEADVKKAYHAKAKVAHPDVHVPAGANNGADFTVLNEAYMTLHDYYS ALRNSASSRHGHEGQDSSSVVFSVKILN*Dactylococcopsis MTEGFYLYGIFPPPGPKTIETQGLDKQPIFSHTVEGFTFLYSEAQQS 117salina PCC 8305_gvpF RYLASRRNLITHTKVLEEAMENGSRTLLPLQFGLIVPDWETVVQDLLQHQAESLHFFLEKLEGKREVSLKIYWETNAELNALLEENPALKARRDNLEGKQLSMDEVIQIGQALEQEMEGRKQDIISRFEEVLIPFAFEIKENDVLTETMIYNTAFLINWDAESDFGEQLEAIDAEFSPRLKIRY NNFTPPYNFVELRE*Desulfobacterium MSKKNLKRNGRYLYAIIEASEEKTFGSIGMDGSDVYLIVEDKTAA 118vacuolatum_DSM VVSDVPNKKIRPQRKNIAAHHAVLNKIMEEITPLPMAFGIIADGEQ 3385_gvpFAIRKILADNRDVFREQFATVSGKVEMGMRISYDVPNIFEYFISTDSEIRAARDQYFGGNREPSQEAKLELGRMFNRQLNANREEYTNQVIEILDDYCDDIKENKCRNEQEVTSLACLINRSDQKRFEEGVFESARHFD NNFSFEYNGPWSPHNFVNILIEL*Desulfomonile tiedjei MEKATIKTTGSNGRYLYAVVPGSQERVYGCLGINGGNVYTIAAKD 119DSM 6799_gvpF VAAVVSDVPHQKIRPERRHFAAHQAVLKRVMLDGDLLPMSFGIISQGPKAVRAILSRNNKSVQQQLKRISGKAEMGIKVTWDVPNIFEYFIDVNRELREARNKLVQPNYLPTQQEKIEIGRMFEEILNLERERHTKQVERVMSKRCSEIKRSKCRTEIEVMNLSCLVDRTLLSDFEAGVLEAASHFDDSFAFDFNGPWAPHNFVDLEIDV* DesulfotomaculumMSTGRYVYCVINSIEPLTFMSGPVGNEPEGVFTVHYKELAAVVSQ 120 acetoxidans_DSMSSEEKYNVCRENTIAHQKVLEEVLVSHPLLPVRFGTVAQNEEIVKK 771_gvpF1FLLQERYAELRSMLHNVTGKVQMGLKVLWTDMKTVYQEIVEENPQIKNLKKKLESKPAETIHYEMIDLGQMVNQALLRKKEKQKEMVLKPLQKIALETKESFLYGDQMFVNADFLISRSSLDDFNAKVNELGEFFNEQALFKYIGPLPPYNFVTLYVNF* DesulfotomaculumMVKNHNTDHLKELYIYGLIGGTPFKDELEKISVIQENTPIYGVWHK 121 acetoxidans_DSMNIGFAVSAAPDYPLKDLSKESIIQLFVDHQQVLECLRQKFSLIPVKL 771_gvpF2GTVLESVTEAAAVLANNEEKFNDLLNYLKDKVELNLSVSWNDLNEVVAKIGEEDEVKKLKQSLLAQEQVSQEDLIKIGKIISFQMQQKKQAAREYIISELRNLWEDYFINEVVDENSILNLTLLAITGKVDDVNKKIEYLNQIYRDSLDFSLTKSLLPQGFSTVSIKKITMDQLLLAKDILKLPDTASLQDINAARRALLHCYHPDKNDHAAVNKVQEINAAYKLLEEYCQENSSDFNVDLITDYYIMKVIKADKSNVNSMNME* DolichospermumMNTDLAHKNFGLYLYGIFPDTIPETLEIKGLDGKSVHSQVVDGFTF 122 circinale_gvpFLYSQACQEKYLASRRNLLAHERVLEQTMHEGFHVLLPLRFGLVVKDWETIMSQLINPHKEQLHKLFEKLAGQREVSIKILWDAKAELQAMMESNHDLRQQRDNMEGKKLSMEEVIQIGQLIESNLQARKQAVIEVFTRELNPLAQEIVVSEPMTEEMIYNAAFLIPWDSEPLFSERVESIDQKFGNRLRIRYNNFTAPYTFALLDS* EnhydrobacterMNPPEAYIAGRTAAKSVEDRKARPQDLAEGKYVYAIIACDEPREF 123 aerosaccus strainKNRGIGERGDKVHTINHRQMAAVVSDSPTIDYERSRRNMMAHTV ATCC 27094_gvpFVLEEVMKEFDLLPLRFGTVASSAESVERQLLVPRYGELSAMLEKMRGRSEFGLKAFWHEGVAFGEIVRENARVRKLRDALQGRSLEESYYQRIQLGEEVEKALTAIRARDEELILSRLRPFMRDIRTNKIISDRMVLNAAFLVERGDVPALDEAIRQLDQEFSERLMFKYVGPVPPYNFVNI AINWER* IsosphaeraMRNAPPTRPGSVTPASPGKPVIDGPARYLYAFTHDLPEGPLADLEG 124 pallida_ATCC-LPGARVVVVADGRVAAVVSPCPLGKVRPERQRVAGHHHVLKHL 43644_gvpFQDTLGKAILPASFGMVADSEEDLRALLRHHSAAIAEGLVRVQGKVEMTVKLRWAPDNVAQAVLGRDPELRQLRDQLYSNGQTPTRDQSLDLGRRFHHALERQRDHYAAYLRAALSPLLSELVEEDLRDERDLVHWACLIENQRRAGFEAALDRLAEELEDDLVLELTGPWPPHHFVDLD LDDDHDDDEEE*Legionella drancourtii MDSTSKKPAASNLYLYAIASVNENQEPISFHGIEEQPIDLVPYKDIM125 LLAP12_gvpF LVVSNLSKKKVRPERKNVAVHHAVLNHLMKHNTSMLPIRFGMIADNRKEVQRLLTINYDMLHTKLKMMAGRVEMGVSLSWDVPNIFEYLLNRHSQLRETRDKLLANPAHEPSRDEKIEIGALFSQILDEEREVYTDTILSLLSPVCCDVVKSTYRNDTEIMNIFCLISAARRDEFEEKIIEASTILDDNFVIKYTGPWPPHNFSKLNLSLE* Lyngbya confervoidesMPQLLYLYGIFPAPGPQDLEVQGLDQQPIHTHIIDEFVFLYSVAQQE 126 BDU141951_gvpFRYLASRKNLLGHERVLEAAMKVGYRTLLPLQFGLIIETWDRVIKELITPRGDALKRLFAKLEGRREVSVKLLWGPDAELNQLMEEDAGLRAERDRLEGQQLSMDQIVDIGQAIETAMTERKDDVINAFRQRLNALAIEVLENDPLTDAMIYNTAYLIPWEDEVKFSQAIEELDEQFEDRLRI RYNNFTAPYNFAQLDQLS*Microcystis aeruginosa MTVGLYLYGIFPEPVPDGLVLQGIDNEPVHSEMIEGFSFLYSAAHK127 NIES-843_gvpF EKYLASRRYLICHEKVLETVMEAGFTTLLPLRFGLVIKTWESVTEQLISPYKTQLKELFAKLSGQREVSIKIFWDNQWELQAALESNPKLKQERDAMMGKNLNMEEIIHIGQLIEATVLQRKQDIIQVFRDQLNHRAQEVIESDPMTDDMIYNAAYLIPWEQEPEFSQNVEAIDQQFGDRLRI RYNNLTAPYTFAQLV*Nostoc punctiforme MSFYIYGILTLPAPQNLNLEGLDRQPVQIKILDDFAVIYSEAQQERY 128ATCC 29133_gvpF LASRRNLLSHEKVLEEIMQAGDRYLLPVQFGLLVSSWETVSQQLIRPHQEELTQLLAKLSGCREVSVKVFWDTEAEIQGLLAEHPNLKTERDKLVGQPLSMERVIQIGQVIEQGMSDRKQGIIDVFKGTLNSIAIEVVENTPQVDTMIYNSAYLIPWEAESQFSEHVESLDRQFENRLRIRYNN FTAPYNFARLRLTTSN*Nostoc sp. PCC MSSGLYLYGIFPDPIPETVTLQGLDSQLVYSQIIDGFTFLYSEAKQE 1297120_gvpF KYLASRRNLISHEKVLEQAMHAGFRTLLPLRFGLVVKNWETVVTQLLQPYKAQLRELFQKLAGRREVSVKIFWDSKAELQAMMDSHQDLKQKRDQMEGKALSMEEVIHIGQLIESNLLSRKESIIQVFFDELKPLADEVIESDPMTEDMIYNAAFLIPWENESIFSQQVESIDHKFDERLRI RYNNFTAPYTFAQIS*Octadecabacter MKREVVRMTDENTINSKYLYAIIKCREQREFIARGIGERGDAVHTI 130antarcticus 307_gvpF1 AYKGLAAVVSDSPVMEYDQSRRNMMAHTAVLEELMEEFTLLPVRFNTVAPEAGAIEERLLVPRHEEFTQLLGQIDKRVELGIKAFWHDGMIFEEVLRENDSIRKMRDALEGKSVDGSYYERIQLGEKIEQAMIKKRVEDEEIILSRIRQHVHKSRSNKTIGDRMVLNGAFLVDANKESDFDKAVQLLDQDLGNRLMFKYVGPVPPYNFVNIVVNWGVV* OctadecabacterMTVVAEENMTGSVGLYVCAIVAEWESNSALIKCANEAQGEIQLIG 131 antarcticus 307_gvpF2QGGITAVVMVPPEDQPVSRDRQELVRQLLVHQQLVERFTEIAPVLPVKFGTLAPDRESVELGLERGREKFFTAFGGLSGKTQFEITVTWDVADVFAKIAKLPAVVKLKVDLVATSESDRPINLDRVGRLVKETLDHQRAQTGKVLLDALLPLGVDSIVNPILNDSIVLNLALLVDTDQADALDRCLDELDSTFHGALSFRCVGPMPPHSFATVEINYIEPTQVSHACCVLELDAAHNFEEIRSAYHRLARQTQQDIAPDVVVDNKSSSVGIAVLNDAYKTLLSFVDAGGPVVVSVQRQEDAYATDIPSSGG* OctadecabacterMTDEKKVNSKYLYAIIQCREPRELKARGIGERGDVVHTVVHKGLA 132 arcticus 238_gvpF1AVVSDSPVMEYDQSRRNMMAHTAVLEELMEEFTLLPVRFNTVAPEAVAIEERLLVPRHDEFTQLLGQIDKRVELGLKAFWHDGMIFGEVLRENDSIRKMRDSLKGQSVDGSYYERIQLGEKIEKALTEKRLEDEEMILSRIRPHVHKSRSNKTIGDRMVLNGAFLVDAEKESKFDEAVQSLDQDLSDRLMFKYVGPVPPYNFVNIVVNWGES* OctadecabacterMRAQKVIPAAEENISGNVGLYVCAIVAERVSCSALIQCANDAPGEI 133 arcticus 238_gvpF2QLIGHGDFTAVVMVPEKDQLVSPDRKELMQQLLVHQQLIEKFMEIAPVLPVKFATLAPNRESVELGLEVGSEKFSAAFNSLSGKVQFEVIVTWDVAEVFAEIAKEPAVAKLKVDLAAMPESYGSVSLEQLGKLVKETLELRRAETGKVLLDALVQVGVDNVVNSILDDSIILNLALLVEAKRADAFDRCLDELDSTYHGALTFRCVGPLPPHSFATVEITYLEPAKVTEACDILELDVARSTEEVRSAYHRLARKSHPDIVPDVAVGETASVSMAVLTDAYKTLLSFVGAGGSVVVSVQRQEASYAADIISSAG* PelodictyonMDIETTKEGRYIYGIIRNSEFIDFGQIGIGKRNDRVYGVIYKDICAV 134 phaeoclathratiforme_VSSTPIIQYEARRANMIAHQKVLEEVMKRFNVLPVRFSTISPHDND gvpF1DAIIKILITDYSRFDELLIKMKGKKELGLKVMADETRIYENIIQKYDNIRSLRDKLLNQPADKIHYQRVKIGEMVADALKKEIESYKQQILDILSPIAEDIKITDNYGNLMILNAAFLIKEVKESEFDDSVNKLDEKYGNIMTFKYVGTLPPYNFVNLSINTKGV* PelodictyonMEKDGKYVYCIIASTYECNFGAIGIGGRGDLVNTIGFQGLSMVVSD 135 phaeoclathratiforme_HPLNHFVLNPDNILAHQRVIEVVMSQFNSVIPVRFGTVAATPDEIR gvpF2NLLDRRYGELSELLERFENKVEYNLKASWRCMIDIYKEIDKEHVELKQLRREIEGLKDEEKRKLLIVEAGHIIENELQKKKEVEAYEIVTYLRKTVVAHKHNKTTGEAMFMNTAFLLNKGREVEFDNIMNDLGEQYKDRSDYYYTGPLPIFNFIDLRILPEKWEL* PelodictyonMDRQGIYIYGFIPNHYLTDIKTILIESGIYSIEYGSIAALVSDTMVDDI 136phaeoclathratiforme_ EYLNREDLAYLLVDHQKKIELIMSTGCSTIIPMQLGTIVNSGNDVIKgvpF3 IVKNGLRIINKTFDDIADIQEFDLVVMWNNFPDLIKKISDTPQIRIMKEEIANKGSYDQADSINIGKIIKKKIDEKNSKVNLDIMNSLSSLCICVKKHESMNDEMPLNSAFLIKKDKENSFIEMVNQLDIKYENLLRYKIVGPLPCYSFYTLESKLLNKKEIEKAEKILGIDAYKSESDIKKAYRAKAAHAHPDKNNTISAIDNDDFIEINKAYQILLEYSSVFKDSPDHKPDEP FYLVKIKK*Phormidium tenue MADRYYLYGIFPAPGPAELPLMGLDEQVVQAQQLGDFTFLYSLAC 137NIES-30_gvpF QKRYLSSRKNLLGHEKVLEAAMEQGHRTLLPLQFGLIVESWNQVQEDLVTPYAEDLTQLFGRLNGCREVSIKVQWEPSTELEMMMAENADLRAQRDQLEGTQLGMEQVIFIGQQIESALEERKQGIVDQFRQALSPLAKDVLENAPQTDVMIYNAAFLIPWESEAEFSQAVDAIDSTFGD RLRIRYNNFTAPYNFAQLN*Planktothrix agardhii MGNGLYLYGILPTNRVRPLALHGLDKQPIQTHPVDEFSFLYSETQQ 138str. 7805_gvpF ERYLASRRNLLGHEDVLEKVMQHGYRSVLPLQFGLIVKDWDHVKAQLIIPYQDRLKELFHKLEGKREVGVKIFWEETEELDLLMTENQELREKRDSLEGKRLSMDEIIGIGQEIERAMQDRQQGIIDKFQQILNPLAQEIVENDNLTSAMIYNAAYLIPWDIEPQFGDKIEELDHHFNNRLRIR YNNFTAPFNFAQLNP*Psychromonas MAENKKKVRKSSSKVIAKPKVIYAITAGGLQDLGNLVGINKSDIYT 139ingrahamii 37_gvpF IEKESISFVVSDLSPSSPRPRPDRRNIMAHNEILKQLMSKTSVLPVRFGTVATGERAVNRFCSQYNAQLLEQLDRVQDRVEMGIKVTWNVPNIYDYFVDNHSELREERDRVYDGNKNPRRDDRINLGHMYDALVTEARLSHQTDLEEIILPGCDEIHSIPPKDEKVVVNLACLVQRADLEVFEERVVEAGKTLDNTYDIELNGPWAPHNFVELDLKTMTGRR* Serratia sp. ATCCMMSIDKSRNHRAKVLYALCVSDDSTPNYKIRGLEAAPVYSIDQDG 140 39006_gvpFLRAVVSDTLSTRLRPERRNITAHQAVLHKLTEEGTVLPMRFGVIARNAEAVKNLLVANQDTIREHFERLDGCVEMGLRVSWDVTNIYEYFVATYPVLSETRDEIWNGNSNANNHREEKIRLGNLYESLRSGDRKESTEKVKEVLLDYCEEIIENPVKKEKDVMNLACLVARERMDEFAKGVFEASKLFDNVYLFDYTGPWAPHNFVTLDLHAPTAKKKTLTRAG TLSD* StellaMQTEALAPAAVAAEGKYLYCIIDAPAPATFASPGIGGRGDVVHTL 141 vacuolata_ATCC-AVGRLAAVVSDTPRIEYENSRRNMMAHTKVLEEVMAHHTLLPVC 43931_gvpFFGTVGSGDDVIAEKILEGRREELSRLLEEMRGRVELGLKATWREEVIFAEVLDEDPAVRKLRDSLVGRSPEKSHFERIRLGELIGQALLRKRRDEEERILDRVRPFVRKTKLNKPIGDRMILNAAFLVETAREAALDQSVREMDADWGARLSFKYVGPVPPYNFVTITIHW* Thiocapsa rosea strainMQQAKRQDVAAGRYIYAIIPDRGDHSLGRIGLDESEVYTIGDGRV 142 DSM 235AAVVSDLSGGRIRPQRRNMAAHQEVLKQVLREVSPLPAAFGLMA Ga0242571_11_gvpFDDEAAIIRILKDNQDAFLNQLERVDGSLEMGLRMSWDVPNIFEYFVGAHPELQELRDDFFRDGSNLTQDQMITLGRSFERLLEQDREEYTEQVESVMRSCCREIKRNKCRTEKEVLHLACLVDRDAAGRFEQVVLQAARPFDNNYAFDFNGPWAPHNFVEMDIHV* Tolypothrix sp. PCCMDAGLYLYGIFSDPIPPTVSLKGLDSQPVYSQVIEGFTFLYSDAKQE 143 7601_gvpFKYLASRRNLISHEKVLEQAMQEGFRTLLPLRFGLVVKNWETVISQLIQPCERQLRDLFQKLAGKREVSVKILWDTKAELQAMMQSNPDLKQKRDQMEGKNLSMEEVIEIGQLIESNLQQRKEAVIKTFFDELKPLAEEVVESEPMMEEMIYNAAFLIPWDQEALFSQRVEAIDKKFGDRL RIRYNNFTAPYTFAQIS*Trichodesmium MEFGFYVYGLIQEKGKMDESKDESKNGLKGSNESKDELKGLDKE 144erythraeum DVKIQDVDEFAVLYSIAKKERYLASRRNLITHEKVLESAMEAGYR IMS101_gvpFNLLPMQFGLVVSEWEKFSQDFTKPCEQQIHDLFTKLKNNREVGIKIYWEPDAELEKLLENDKDLKEERDSLKDKKLTMDQVIDIGQKIEQGMNERKQNIIEIFQETLNKMAIEVIENEVQTEKMIYNAAYLIPWDQEEDFGEKVETIDSKLCERGNFTIRYNSFTAPYNFARIRQQD* gvpF/L AncylobacterMTDLLVFAVVPADRFDPAILAEGDGLPPGLRAIAAGPLAAVVGAA 145 aquaticus strainPEGGLKGRERSALLPWLLASQKVMERLLANAPVLPVALGTVVED UV5_gvpFL1EGRVRHMLDAGAAILGEGFQAVGDGIEMNLSVLWHLDTVVARLLPGVAPELRQAAAGGDAIERQALGVVLAGLVSAERRRARARVIEALQAVTRDFAIGEPTEPGGVVNLALLVDRAAEEALGAALEALDAEFDGALTFRLVGPLPPYSFASVQVHLSPAAAVCGARAALGVEPDASPETVKAAYRRAARETHPDLVPMGGEDEEAPEATADETSRFVVLSDA YRVLEGEHAPVSLRRLDSVLTE*Ancylobacter MLYVYAITADYAAGANHLLPAKGIVPGVPVQRFGTGALGAVASP 146aquaticus strain VPVTVFGKEALHALLDDADWTRARILAHQRVVSSLLPLATVLPLKUV5_gvpFL2 FGTLVAGEASLAAALTSQHDALDATVARLRGAREWGVKLFFEAPTRTIRAEEPVGAGAGLAFFRRKKEEQETRAAAEAALDRCVAASHRRLASHARAAVANPLQPPELHGHPGTMGLNGAYLVAAENEAAWRVCFSELEQAYAALGARYVRTGPWAAYNFTGGGLV* Aquabacter spiritensisMSGLLVFAIVPADRIEPGLLAPAEGLPPGLETVVAAGFAAIVGTAP 147 strain DSMEGGLKGRDRGSLLPWLLASQKVIERLMARGPVLPAALGSVLEDES 9035_gvpFL1RVRHMLVCGQAALAAAFETLNGCWQTDLSVRWDLSRTVAHLMTELPPGLRAAAETGDETARRSLGAALAGLVAGERRRIQSRIGAVLGAVARDLIVSDPVEPEGVVGVALLVDAPASAQVDAALDRLDGEFEGRLTFRLVGPLAPYSFATVQIHLGPAAGLAGAHAELGLEAGAPLEAVKAAYHRLIVGLHPDLVPHGSPGDDADDAASGKGGRAARFAAV TAAYRTLQAEHAPVSLRRQDGLSPG*Aquabacter spiritensis MLYVYAITADHPGPHDAGSLPGEGIVPGAPVRLLPFGDLAAAVSP 148strain DSM VSAVDFGPEALPARLQDVDWTGQRVLAHQRVVDSLVDVATVLP 9035_gvpFL2MKFCTLFSGAAALRAALADNRAALEATVVRLRGAREWGVKLFWEAPPAEPAPVERGPGAGAAFFQRKRDAQRLRAEAEAALAHGVAESHRRLAARARAAVANPVQPAAVHRRRGEMALNGAYLVPRADEAAWRESLAELERTYAGAGIRYELTGPWGPYNFTGGGLAGS* BradyrhizobiumMTMNLVGITTPDVAGAIAAAGGRLADVETRAVEAGGLVALLALS 149 oligotrophicumKAPFWHVLRRSRTALRSMLTAQRILEAAAVYGPLLPARPGTLIRN S58_gvpFL1DAEACMLLRSQCRHLAEGLRLHGTSRQYQITISWDPVAALAARRDHQDLVEAAAASADGAADKAASMIQRFMSDQQARFEAEAMRALAAVAEDVITLPVNQPDMLMNAVVLLAPGAEPELERVLEALDRGLRGKNLIRLIGPLPPVSFAAVSIERPGRQRIAAARRLLGIGEATRTCDLRRAYLDKAHAHHPDTGGHAADASIVGAAAEAFRLLARVAEARASA GQDDVILVDIRRQDQQRSLST*Bradyrhizobium MSKANLGIGLVHGVVTAQSAALLPQIVDAFDATEIIVVNTEQQALL 150oligotrophicum ISDIPQYLRGHVEADTLFSDPARISTLAMKHHRILQAAAVVTDVVP S58_gvpFL2VRLGTLVRGPSGARDLLNREAVRFAGHLVTIHNALEFSVRILPTEQPSRRVARPVPSSGRDYLRIRRDERCGQRPAVVDITLQELASRAVAIRERQSASRSGGRTPALAEAAFLVDRHALAAFDDCAGRIERQIAEN GLALDIFGPWPAYSFVDGARENLG*Bradyrhizobium MSSPRLIGLLAADDVPADLADQIMSCGPVAAAIRFAPAAASSSESL 151oligotrophicum DHHAAVVAWCRRAAFLPSRAGIPISPELLQSIARSAWYHRSTIEHIES58_gvpFL3 GRVEISVELERRDGVRDGGIDGGGRAYLRATAHDLRACEVGVATAANLLAMYSERADADLIARTAPLPAIRLRASVLVRRAVAPRLARQFDSMLSAISDRLVCRVTGPWPPYSFSTIREPS* BurkholderiaMVWLTYAVLTPKRSITLPPGVAGARLEIVDGAHLRTIVSEHPRAPS 152 thailandensis sp.ATIPSALDFGQTVAALFRHGAIVPMRFPTCLDSKQAVRDWLDDES Bp5365 strainDMYRDLLQRIDGCVEMGLRFRLPEAPRAQPRPQAGGPGHAYLAA MSMB43_gvpFLRGAPNSVARSHGERIAAVLRNLYRDWRFDGLVEGFVSLSFLVRQTTLDDFVDRCRQAARETAFPLYMSGPWPPYSFATDERSSAPEPHRALRLMRRPSTAVSISANVAAPEKKDSAR* DesulfobacteriumMTLHLLYCVFSSGEMEKTRKLVPPGIDGEPVHEICSNKISGVVSTL 153 vacuolatum-DSMGKPPDTHVKSLLAYHGVIDSYHQNRTVIPMRFAAVFRTYAHMITA 3385_gvpFLLNNNEKSYLLQLKRLHDCTEMCVRFISNSPCCVKKKEPAISPKKISGTTFLQQRKAMYEQQNRLPPEIHEKTRDILQHFRGLYMEFKQESQPLEKDCPSLSLQGAEKTDGNALLISLFFLISKKNISLFRSRFQNICGSSSGRHMMNGPWPPFNFINTESNLTDPS* Desulfomonile tiedjeiMLGSLAAIQFLSISSYGADEMKFLMYCIFTENSIEPPHSLVGVNRSP 154 DSM 6799_gvpFLVRIISCDGLAAAVSVITQKEIPRDPATGLDYHKVIQWFHERIGVIPLRLGTCLGHESDVVQLLHSHGARYKSLLKELDGCVEMGIRVIHDRPGPQELASKSPFISRFNGTESGTDYLMRRKVLFDADEFAISRNREIVERYHSPFTGLYVSFKAQTSKFSPLGTDRNSVLTSLYFLIPRQSADSFRAIYGDLRSGLHERIMLSGPWPPYNFVLPEDCL* EnhydrobacterMEGHRIYIYGIVRDAADGGPAPVPPVAGLDGGALRAIAGYGLAAI 155 aerosaccus strainASAVDLSKAGIPFEEQLKDPDRATALVLEHHRVLQQAIDAQTVLP ATCC 27094_gvpFLMRFGALFQDDRGVTDALEKNRCGLMDALGRIDGAREWGVKIFCDRAVAARQLSATSAVVQAAEKELSGLAEGRAFFLRRRLERLRTEETDRAVAHEVDVSRQALCELARASAPLKLQPAAVHGRGEDMVWNGAFLVPRSGEERFLSRLEVVVQSRSDLGLHYEVTGPWPPFSFVDGQL EGGGDACPDGA*Octadecabacter MRSATSIVYAYGVLTNCSDIALDMPRSDLAGLVKNGPLRILPFGNI 156antarcticus 307_gvpFL AAVVCDFVLPNGSDLETLLEDSRSAERLILNHHQVLSYIVSQHTILPLRFGAAFTEDAGVIAALGGRCSELQKALGRIDGALEWGVKTFCDRKLLKQRVRGTGSEISDLESEIAKQGEGKAFFLRRRKERLILEEVEEILEQCVVGTQEQLEPSVIEEALVKLQPPTVHGHEHDMLSNISYLIARGTEDAFMQSLEDLRLAHAPYGLEYQMNGPWPAYSFSDQQLEGGV NDQ* OctadecabacterMSSATSIVYVYGVLTNCSDLVLDFPPGDLAGIVESGPLRILPFGDIG 157 arcticus 238_gvpFLALVCDFILPDGSDLKTILEDSRSAERMILNHHLVLADMVSRYTILPLRFGAVFAEDAGVIAALGGRYSTLQKELDRIDGAIEWGVKSFCNRKMFSECVAETVSEISVLEKEIADQGEGKAFFLRRRIQRLILDEVEKTLEQCLVGAQDQLKSRAIEETLVKLQPPTVHGHKHEMVSNRSYLIARGAEDAFMQSLDDLRVVYAPFGFDYQINGPWPAYSFSDQQLGGGV NDK* RhodobacterMGHYLYGLLAPPARGTLAQMQAAAAGVTSLGGPVALSAVEGML 158 capsulatus SBLVHCPCDLAEISQTRRNMLAHTRMLEALMPLATCLPVRFGVIAQD 1003_gvpFL1LAEVARMIHERRAELVGHAQRLLDPVEIGLRVRFPRDRALAQLMAETPDFVAERDRLMGQGAGAHFARADFGRRLAEALDARRTRDQKRLLAALRPHVRDHVLRAPEEDVEVLRAEFLIPAAGVDAFSRIAHDLAAALGFAGAAEPELQVIGPAPPYHFLSLSLAFDNTSEAA* RhodobacterMAHEIIAILPCEAAQLPSGLTGVVGRGATAVLAPAPGWAERLTGG 159 capsulatus SBPKQTAVRHHSRLEALMAMGSVLPFAAGIACTPEEAALLLRLDAPLI 1003_gvpFL2ARLAAEIGPRRHFQLALDWDESRVLAAFRDSPELAPLFSGAAVTPEALRQAITALADRLSATALRLLDPVAEDPVEQPRAPGCLLNLVFLLRPEDEPRLDAALQAIDALWSEGLRLRLIGPSAPISHALVDIDRADVAALAAAADLLKVAPEAGPEAVTEAAKAALRSPDLAANAAEQIRAAARLLLRAGDIAALGLSGAATLPHLVHLRPGGRKSGLTSSGEAA* RhodobacterMTGLALHGFVSPDGWSAAAAPPARCAVVLGGVAALVSEAGDAL 160 capsulatus SBDTPETAQAAALAHHALISAWHRRGPVLPVRLGTVFSSQAALQTAL 1003_gvpFL3APKAAQLRAALDALADKEEMVLTIVPAARPPDLPPPAATGADWLRARKAVRDRGQARQTDRQQTLAGLQDALRAQGVASLAAPAPREGGSRWHLLIARDDGAGLDRWLAAQADRFDAAGLDLTLDGPWPP YRFAAEILEALDG* RhodobacterMSEPRISGLAPWRADLPDVIGCHGGWVLMGAAADETPEARLRRQ 161 capsulatus SBVGWCRAAVDVLPLSPRLAPTRAEAERLVATRGPDLERAHRHIRGR 1003_gvpFL4LQVIVQLEMCRTDLGLVRREISGGRSWLQDRAERATREARANADFEAQVRRVVRALFPREGQVVTLAPSGTAGQLRLRRAVLVPRAGLQAFAAALSADLDRDGRGGLWDVIAPLPPLAFAALEAGPGGAVT* RhodobacterMIYLYGLLEEPASGHEVLAGMAGVTGPIALARLPGGILIYSSATEA 162 sphaeroidesDILPRRRLLLAHTRVLEAAAWFGNLLPMRFGMMASTLAEVAAML 2.4.1_gvpFL1ASRLTELCAAFDRVRGRVELGLRLSFPREPALAATLATAPDLAAERARLLALRRPDPMAQAEFGRRLAERLDARRGETQRLLFQSLRPLWVDHRLRVPDSDVQVIAVDVLVEDGAQDRLAAALVKAAADCSFAPTAEPSVRVIGPVPLFNFVDLVLSPRREEVA* RhodobacterMRLREVVAVLEGHPPSVLPEGTEAICEAGLTAILGMPPGLLSGRRA 163 sphaeroidesLLEHAACRQAVLERLMAFGTVLPVLTGNCLTPAEAAAALAANSP 2.4.1_gvpFL2RLRQELRRLAGRVQFQVLVQWHAALVPKRTDPDETAEDLRLRFTHRIADALARVAERHVNLPLREDMLANQALLLLQTRTDDLDRSLEQIDALWTEGLRIRRIGPSPPVSFASLNFRRVSSAAIRRARHRFDLEGPVDPIRLRALRRDLLLRASEAERAEILAAAAVLDLLTRCAASGGDLH LVRIWSEGQAVPSDLEDAA*Rhodobacter MSGLLLLGVVSGLGISPAITSPHLRLDGDGYAAILLSLDRLPPDPAS 164sphaeroides PDWAVQAALAQNAILSAYAATEDVLPVALGAAFTGIAAVKRHLD 2.4.1_gvpFL3AERATLDAGMERLAGRAEYVAQLIAEQVADGAAPAPASGSAFLKARSARHEQRRHLARERTGFARATAEELASLSCSASARPLKPDGPLLDLSLLVARDRVPGLLEAAEASSRAGSRLALSVRLIGPCAPFSFLPET RGHD* RhodobacterMAGDARSRVRLHLAAMRDCETFLPFPPAATIAVDEAIAWCGRRTN 165 sphaeroidesALAEEIDRFSRQRQLTVSARLIAPLLPDAAASGAGWLRARRDASA 2.4.1_gvpFL4HQARLRTVLMQIMSLLGEVRCIPGRLQDEVQVNLLVPAAETHPVLHELRERLRVGDALWSACTVTGPWPPYAFISWETA* Rhodococcus hoagiiMSEQESAPDGGGPVVYVYGLVPADVEVKEDATGIGSPPRPLKIVH 166 103S_gvpFL1HEDVAALVSEIDPDTPLGSSDDLRAHAAVLDSTATVAPVLPLRFGAVLTDTDAVVAELLEPYRDEFHEALEQLEGKVEFVVKGKYVEDAILREILADDPEAARLRDVVREQPEDTTRDERLALGERISQALTAKREQDTGRIVEALQPAATAVAPREPTDDEEAGSVAVLISADGVDELDKAVARLIDDWQGRVEVTVTGPLAAYDFVKTRAPGT* Rhodococcus hoagiiMTPDDGVWVYAVTGDGSFPGGISGIRGVAGEELRTVTDSGFTAVV 167 103S_gvpFL2GTVRLDTFGEEALRRNLEDLDWLADTARRHDAVVAAICAGGATVPLRLATVYFDDDRVRTMLRDNAEQLGEALQQIADRSEWGVRAYLERPRSEPRDAREKTGRPSGTAYLMQRRAQVAAREQAESAAGRRADEIFAELARWAVAGVRQPPSPPDLAGRRSQEILNTSFLVDNGRHREFVTAVEELDARLSDVDLVLTGPWPPYSFTSVEASAR* Serratia sp. ATCCMSLLLYGIVAEDTQLALEPDGSPHAGEEPMQLVKAATLAALVKPC 168 39006_gvpFLEADVSREPAAALAFGQQIMHVHQQTTIIPIRYGCVLADEDAVTQHLLNHEAHYQTQLVELENCDEMGIRLSLASAEDNAVTTPQASGLDYLRSRKLAYAVPEHAERQAALLNNAFTGLYRRHCAEISMFNGQRTYLLSYLVPRTGLQAFRDQFNTLANNMTDIGVISGPWPPYNFAS* Stella vacuolata-MSGLLVFAIVPADGIEPGILAPREELPANLRAVAADGFAAVVGAAP 169 ATCC-43931_gvpFL1EGGLKGRDRSVLLPRLLASQKVIERLMARGPVLPVTLGTVLEDEARVRHMLAAGAPMLEAAFGTLGDCWQMDLSVRWDLNQVVARLMGEVPGDVRAAAGSGDEAARRALGEALAGLAAGERRRVQSRLAAALRDVARDLIVSEPVEPESVVDIAILVERPALAEVEAALDRLDAEFEGRLKFRLVGPLAPHSFATVQVHLAPEAALAGACAELGVERGAGLQDVKVAYHRALVRFHPDLAPHGDDGGPEDEHDGGEGRASRLLTVTAAYRALQAEHAPISLRRQDGIAVNQEQDASAAMGQQRGIVPGRE LQALRM* Stella vacuolata-MLYVYAIAADHPDPDNAMFGGEGIVPDAPVRLLQLGDLAVAASL 170 ATCC-43931_gvpFL2VSAADFAADALRAHLEDARWTALRVLAHQRVVDSLLPHATVLPMKFCTLFSGEAALKQALAHNRAALQATVERLRGAREWGVKLYWEAPRNPAPPSAGQGEAGAGAAFFQRKRDQQRQRAEAEAAVARCVAASHRRLADAARAAVANPVQPPAVHRQPGEMALNGAYLVARAAEPAWREVLAELERTHADGGIRYELTGPWGPYNFTGSGLVGS* Thiocapsa rosea strainMSDRPRPMLHCILRSPPGSIARAEAGLRWIERDGLAALVADREPSE 171 DSM 235 Ga0242571-IAGASSVGLQRYADIVAEIHACAAVIPVRFGCLLAGDEAVGKLLHR ll_gvpFLSRDRLHGLLDQVGDCLEFGIRLLLPADAPAATDDDAAPRLHANAPSDPRADPDMGPGLSHLLAIRHRLDVEASLAARAREAREVIKGRVAGRFREVREELGQIDGRSLLSLYFLVPREQGEHFVECLRQDASSLRGTGLLTGPWPPYNFVGAIDDDIRSLD* gvpG Anabaena-flos-MLTKLLLLPIMGPLNGVVWIAEQIQERTNTEFDAQENLHKQLLSL 172 aquae_gvpGQLSFDIGEIGEEEFEIQEEEILLKIQALEEEARLELEAEQEEARLELEAEQEDFEYPPQFTAEVNKDQHLVLLP* Bacillus-VLHKLVTAPINLVVKIGEKVQEEADKQLYDLPTIQQKLIQLQMMF 173 megaterium_gvpGELGEIPEEAFQEKEDELLMRYEIAKRREIEQWEELTQKRNEES* AncylobacterMGMLTDVVFAPAVGPLKGVLWLARIIAEQAERTLYDEGVIRAALL 174 aquaticus strainDLEQQLEAGEIDEDAYETQETVLLERLKIARERMRSGL* UV5_gvpG Aphanizomenon flos-MLTKLLLLPIMGPLNGLVWIGEQIQERTNTEFDAQENLHKQLLNL 175 aquae NIES-81_gvpGQLSFDIGEISEEDFEIQEEELLLKIQALEEEARLELELAEEEARLELELEQEEEEDFVVKPQLTTEIDRDKDLVLLP* AphanotheceMVFKLLLLPITGPIEGVTWLGEQILERANQELDEKENLNKRLLSLQ 176 halophytica (strainLSLDLGEISEEEYDEQEEEILLAMQAMEDEENNQAEEETD* PCC 7418)_gvpGAquabacter spiritensis MSLVTDVLFAPAVGPLKGVLWLARLIAEQAERTLYDEDVLRAAL 177strain DSM LDLEQRFEAGEISEADYETEEDILLARLKIARERMRSGL* 9035_gvpGBradyrhizobium MLFQILTSPVSGPFRMVSWIGGAIRDAVDTKMNDPAEIKRALAAL 178oligotrophicum EQQLEAGSLSEQDYERMEMELIERLQSSLRHGSGNGG* S58_gvpGBurkholderia MFILDNLLAAPIKGMFWIFEEIAQAAEEETIADIEMIKAALVELYRE 179thailandensis sp. LESGQIDETEFETRERALLDRLDSLETS* Bp5365 strainMSMB43_gvpG Chlorobium luteolumMFILDDILLAPLSGMVFLGRKINEIVQNEMSDEGAVKEQLMKLQF 180 DSM 273_gvpGRFEMDELSEEEYDRLEDELLSTLAEIRAQKENR* DactylococcopsisMVFKLLLLPITGPIEGITWLGEQILERADQELDSKENLNKRLLSLQL 181 salina PCCSLDLGEISEEEYDEQEEEILLAMQAMEDEENEEEES* 8305_gvpG DesulfobacteriumMFLVDDILFFPAKSLVWVFRELHNAVQQEKTNESDALTTELSELY 182 vacuolatum_DSMMMLETGKITEEEFDEREEQILDRLDEIQERDQ* 3385_gvpG Desulfomonile tiedjeiMERYTMFLLDDILFLPMNGVLWICNEIHDAAEQELHNESDAITAQ 183 DSM 6799_gvpGLQKLYTLLEAGDIGESEFDVLEAELLDRLDAIQERGALLEA* DesulfotomaculumMLGKLLLSPILGPVMGVKFIAEKIKQQADQELYDKSKIKQDLMEL 184 acetoxidans_DSMQIKLELEEITEEYYLQREEELLVRLDELASMETEEEEV* 771_gvpG DolichospermumMLTQLLLLPIMGPLNGVVWIAEQIQERTNTEFDAQENLHKQLLSL 185 circinale_gvpGQLSFDIGEISEEEFEIQEEEILLKIQALEEEARLELEAEQEEARLELEAEQEQARLELEAEQEELENQPQLTPKIDTYRHLVKL* EnhydrobacterMGMLARLLTLPVSAPVGGVLWIARKIEEEANAERWDRNKITGALS 186 aerosaccus strainELELELDLGAIDVEEYDAREAVLLQKLKELQEVEND* ATCC 27094_gvpG IsosphaeraMFLVDDILLAPAHSLMFLLREIHQAALEELRRDAQKVREELAECY 187 pallida_ATCC-RALETGALTDEEFASLETDLLDRLDALEELARFNSDEDDDPEDED 43644_gvpG WDVEDDDPAEAVW*Legionella drancourtii MLLLGSILMAPVHGLMAIFEKIKEAVDEEKQHDIERIKSELMALYT188 LLAP12_gvpG KLESGELSEADFEKQEKILLDKLDSLEDEDD* Microcystis aeruginosaMFLDLLFLPVTGPIGGLIWIGEKIQERADIEYDEAENLHKLLLSLQL 189 NIES-843_gvpGSYDMGNISEEEFEIQEEELLLKIQALEEEEAENESESSL* Nostoc punctiformeMVLRFLLLPITGPLMGVTWLGEKILEQASTEIDDKENLSKQLLALQ 190 ATCC 29133_gvpGLAFDMGEIPEEEFEIQEEALLLAILEAEQEERDQTQEY* Nostoc sp. PCCMLGKILLLPVMGPINGLMWIGEQIQERTNTEFDAQENLHKQLLSL 191 7120_gvpGQLKFDMGEISEEEFDIQEEEILLKIQALEAEERLNAESEEDDDLDVQPIFILASEENPVYQDQSRFSEEYEDKEDLVLSP* OctadecabacterMGIILNTLMSPLIGPMKGVFWVAEQIKDQTDAEIYDDSKILVELSE 192 antarcticus 307_gvpGLELLLDLEKIELKDFEAKEDVLLKRLQEIRKAKKNDSV* OctadecabacterMSIILNTLMGPLIGPMKGLLWVAEQIKDQADAELYDDSKILVALSE 193 arcticus 238_gvpGLELSFDLEQIELKEFEAQEDVLLQRLQAIRKAKQNDTD* PelodictyonMFILDDILFAPLNGLIFIAKKINDVVEKETSDEGVVKERLMALQLRF 194 phaeoclathratiforme_ELDEIDEVEYDREEDELLQKLERIRLNKQNQ* gvpG Phormidium tenueMLFKLLFAPVLGPIEGISWVANKLLEQADVPTNDLESLQKQLLAL 195 NIES-30_gvpGQLAFDMGEVAEADFEIQEEEILLAIQAIEDEEDEDE* Planktothrix agardhiiMILRLLLSPITAPFEGVIWIGEQLLERAEAELDDKENLGKRLLALQL 196 str. 7805_gvpGAFDMGDIPEEDFEVQEEELLLQIQALEDEANQENDEID* PsychromonasMFILDDILLAPYSGIKWLFKEIQRQAQEELDGEADRITTDLTNLYR 197 ingrahamii 37_gvpGQFESNEITEQEFEERETVLLDRLDELQEESNLLDEEYDEEYEDDDEEYEDDDEEYEDDDEEYEDDDEEYEDDDKNDKDKNDDHDNDDDD ENKDENDKYNDEER* RhodobacterMGLLRKLLLAPVELPITGALWIVEKIAETAESELTDPGTVRRLLRG 198 capsulatus SBLEQQLEAGEITEEEYEFAEEILLDRLKRGQAAEARSGGP* 1003_gvpG RhodobacterMGLLTSLLTLPFRGPFDGTLWIAARIGEAAEQSWNDPAALRAALV 199 sphaeroidesEAERQLLAGELSEETYDAIELDLLERLKGTAR* 2.4.1_gvpG Rhodococcus hoagiiMGLFSAIFGLPLAPVRGVVWIGEVVRRQVEEETTSPAAMRRDLEAI 200 103S_gvpGEEGRRSGEISEDEAAQAEDEILHRVTRRRDAGASGEE* Serratia sp. ATCCMLLIDDILFSPVKGVMWIFRQIHELAEDELAGEADRIRESLTDLYM 201 39006_gvpGLLETGQITEDEFEQQEAVLLDRLDALDEEDDMLGDEPGDDEDDEYEEDDDEEDDDEEDDDDEDDDDEDDDDEEDDDDDEDDDDEDEPE GTTK* StellaMGLVTNVAFAPVVGPLKGVLWLARLIADQAERTLYDEDLVRAAL 202 vacuolata_ATCC-LDLEQRLDAGQISEADYDAEEEILLARLKIARERMRSGL* 43931_gvpGThiocapsa rosea strain MLIVDDLLAAPFKGIIWVFEEIHKSATAEQRARRDEIMAALSALYR203 DSM 235 ALEQGEITDDTFDTREQALLDELDALDAREDANELGSDEDEDDLDGa0242571_11_gvpG GAGEDAS* Tolypothrix sp. PCCMEVMIMLGKILLFPVMGPISGLMWIGEQIQERTDTEFDAQENLHK 204 760l_gvpGQLLSLQLSFDIGEISEEDFEEQEEELLLKIQALEEEKARLEAESIEDEEDEVEPTYFIAEVEEDKVLAEAFRGNKKYEDNENLVLSP* TrichodesmiumMLLRLLTLPISGPLEGVTWLGKKLQEQVDTEIDETENLSKKLLTLQ 205 erythraeumLAFDMGEISEEDFEDQEEELLLAIQALEEQKLKEEEEDA* IMS101_gvpG gvpJ Anabaena-flos-MLPTRPQTNSSRTINTSTQGSTLADILERVLDKGIVIAGDISISIASTE 206 aquae_gvpJLVHIRIRLLISSVDKAKEMGINWWESDPYLSTKAQRLVEENQQLQHRLESLEAKLNSLTSSSVKEEIPLAADVKDDLYQTSAKIPSPVDTPIEVLDFQAQSSGGTPPYVNTSMEILDFQAQTSAESSSPVGSTVEILDFQAQTSEESSSPVVSTVEILDFQAQTSEESSSPVGSTVEILDFQAQTSE EIPSSVDPAIDV* Bacillus-MAVEHNMQSSTIVDVLEKILDKGVVIAGDITVGIADVELLTIKIRLI 207 megaterium_gvpJVASVDKAKEIGMDWWENDPYLSSKGANNKALEEENKMLHERLK TLEEKIETKR* AncylobacterMNEQRMEHSLQAVGLADILERVLDKGIVIAGDITISLVEVELLNIRL 208 aquaticus strainRLVVASVDRAMSMGINWWQSDPHLNSHARELAEENKLLRERLDR UV5_gvpJ1LEAAVVPSALPADAALEPSLAGEDARHGG* AncylobacterMPSRHSGEIAVADLLDRALHKGLVVWGEATISVAGVDLVYLGLK 209 aquaticus strainLLLTSTDTVNRMREAANAPPDERHLHAD* UV5_gvpJ2 Aphanizomenon flos-VTSTPILPTRPQTNSSRAINTSTQGSTLADILERVLDKGIVIAGDISISI 210aquae NIES-81_gvpJ ASTELIHIRIRLLIASVDKAKEMGINWWETDPYLSTKAQRLVEENQQLQNRLENLESQINLLTSAKVQEQISLVETTEDNTHQTTEDNTHQT HEESIPLPIDSQLDV*Aphanothece MVNPNTNKPKSYQSKGITNSTQSSSLADILERVLDKGIVIAGDITVS 211halophytica (strain VGSTELLSIRIRLLVSSVDKARELGINWWEGDPYLSSQANLLKEENPCC 74418)_gvpJ QALQNRLENMEAELRRLKGETNPEPSFLSESEDNS*Aquabacter spiritensis MSEQRMEHSLQAVGLADILERVLDKGIVIAGDISISLVEVDLLNIRL212 strain DSM RLVVASVDRAMSMGINWWQSDPHLNSHARQLEEENRLLRERLDR 9035_gvpJ1LEAALAPPEGGMLRAEVEVAHGG* Aquabacter spiritensisMPDPEPIIPRTSGDVALADLLDRALHKGLVLWGEATISVAGVDLV 213 strain DSMYLGLKVLLASTDTANRMRDAAAASAAGSHLPGG* 9035_gvpJ2 Arthrospira platensisMTLQSRSSSPQRGVPMSTSGSSLADILERVLDKGIVIAGDISVSVGS 214 NIES-39_gvpJTELLSIRIRLLIASVDKAKEIGINWWESDPYLSSQAQQLSQSNQQLLEEVKRLQEEVRSLKALTSQSSQPVTPPNSENDD* BradyrhizobiumMTFTVHQPTGGDRLADILERVLDKGIVVAGDVTISLVGIELLNIKIR 215 oligotrophicumLIVATVDRALELGINWWEADPRLTTRASELSVENEELKKRLALLE S58_gvpJ1ADAGRNQRPRKRRVRSIAATSGASHER* BradyrhizobiumMTYRADLDYLEPAASSEGSLLELLDHLLDRGVLLWGELRISVADV 216 oligotrophicumELIEVGLKLMLASARTADRWRQTTTQRASIAPGDCP* S58_gvpJ2 BurkholderiaMRSADGEPVSAELAQRLSLCESLDRILNKGAVISAQVVVSVADVD 217 thailandensis sp.LLYLHLRLLLTSVETALVGRAMPREEASR* Bp5365 strain MSMB43_gvpJ1 BurkholderiaMADLLERVLDKGVVITGDIRINLVDVELLTIRIRLLVCSVDKAKEL 218 thailandensis sp.GIDWWNADTFFLGPDRGQSALPGRASAVDVAAGSAVHADAAHR* Bp5365 strain MSMB43_gvpJ2Chlorobium luteolum MPELKHAVNATGLADILERVLDKGIVIAGDIKIQIADIDLLTIKIRL 219DSM 273_gvpJ1 MVASVDKAIEMGINWWQEDPYLSTGAKTSEQTRLLGEINQRIEKL ESINR*Chlorobium luteolum MQEDLYTANRQVTLLDILDRVLNKGVVISGDIIISVAGIDLVYVGL 220DSM 273_gvpJ2 RVLLSSVETMERLDAARAEGLQQ* Chlorobium luteolumMAVEKTIGSSSLVEVIDRILDKGVVVDAWVRVSLVGIELLAIEARV 221 DSM 273_gvpJ3VVASVETYLKYAEAIGLTAKAA* Chlorobium luteolumMAVEKTIGSSSLVEVIDRILDKGVVVDAWVRVSLVGIELLAIEARV 222 DSM 273_gvpJ4VVASVETYLKYAEAIGLTAKAA* DactylococcopsisMVNSNTNQPKSYQSKGITNSTQSSSLADILERVLDKGIVIAGDISVS 223 salina PCC 8305_gvpJVGSTELLTIRIRLLISSVDRAREIGINWWESDPYLSSQAHLMKEENQALQSRLENMEAELRRLKGETNLDQSSLGESDQRSLQ* DesulfobacteriumMAYIDIDNDASKQISICEALDRVLNKGAVITGELTISVADIDLIYLSL 224 vacuolatum_DSMQAVLTSVETARHMFDSQINDAVKEVK* 3385_gvpJ1 DesulfobacteriumMPIQRTAQHSIESTNIADLLERVLDKGIVIAGDIKISLVDIELLSIQLR 225 vacuolatum_DSMLVICSVDKAKEMGMDWWVNNPVFMPNKGTQNDEIADTLTKINSR 3385_gvpJ2 LEHLEKATISGS*Desulfomonile tiedjei MMDEEEHVSLCEALDRVLNKGAVIAGEVTISVANVDLIYLGLQVV 226DSM 6799_gvpJ1 LASVDTIRGKRNELLRHDVGLHLTADNA* Desulfomonile tiedjeiMSIQASTRHSIQSTNLADLLERVLDKGVVIAGDIKIKLVDVELLTIQ 227 DSM 6799_gvpJ2IRLVVCSVDKAKEMGMDWWTNNPAFQPALAQISE* DesulfotomaculumMGPQMGPIKSTGNLSLLDVIDRILDKGLVINADISVSIVGVELLGIKI 228 acetoxidans_DSMKAAVASFETAAKYGLQFPTGTEINEKVSEAAKQLKEICPECGKKSG 771_gvpJ1RDELLHEGCPWCGWISARALRLETEHSQR* DesulfotomaculumMLPIREERATLTDLLDRVLDKGLLLNADILISVAGVPLIGITLKAAI 229 acetoxidans_DSMAGMETMKKYGLLIDWDQESRLAERRLRSSRH* 771_gvpJ2 EnhydrobacterMAVTNGRMEHSIQGSSLADILDRILDKGIVIAGDVTISLVGVELLNI 230 aerosaccus strainRLRLLVASVDKAIEMGINWWEADPYLTSQTKASSEQTELLQQRLE ATCC 27094_gvpJ1RIEGLLAGQATKEQPL* EnhydrobacterMPVQTAHDGELALADLLDRALNKGVVLWGDATISLAGVELVYV 231 aerosaccus strainGLRVLVASCSTMEKYRSSPRKGSMPIARGES* ATCC 27094_gvpJ2 IsosphaeraMIVCSSSTPERIGPPMNLPPPHHAPWCYDSPDLETLPLDPAERIALC 232 pallida_ATCC-EVLDRVLNKGVVIHGEITISVAGVDLVYLGLNLLLTSVETAQSWK 43644_gvpJ1 FRGMIE*Isosphaera MAITRSSRPDVTHSTSGATLADVLERVLDKGLVIAGDIKIKLVDVE 233pallida_ATCC- LLTIQIRLVVASVDKAREMGLDWWTRSPELSSLAATTCPALTPPKQ 43644_gvpJ2EATPPATRIQAPTESAQTTPDQSHPSDPSASNIDEVAELRRHIELMQLRDEARQRAHREELAALRAQLTRLTELLDSPR* Legionella drancourtiiMIIEDKPVSLCETLDRVLNKGVVVAGTVTISVADVDLLYLDLHCL 234 LLAP12_gvpJ1LSSMKGMNLIGSERER* Legionella drancourtiiMELQKSPTHSIGSTTIADLLERILDKGIVIAGDIKVNLVQVELLTIQI 235 LLAP12_gvpJ2RLLICSVDKAKEIGMDWWTHQNDVQSKNGSMPIQEYVTQMEERL KNLENTLASSKNAI*Lyngbya confervoides MTGQSLSRSSSANRQMATATQGSTLVDVLERVLDKGIVIAGDISVS 236BDU14195 1 _gvpJ VGSTELLTIRIRLLVASVDKAREMGINWWENDPYLSARSQELLTANEQLQSRIESLEQELKSLRSQED* Microcystis aeruginosaMTSSTFAGSLRNQSNNSLKTATQGSSLADILERVLDKGIVIAGDISV 237 NIES-843_gvpJSIASTELINIRIRLLIASVDKAREMGINWWEGDPYLHSQSQALLAENRELSLRLQTLETELETLKSLTQLSAMESHDTSPNDEAHSSDA* Nostoc punctiformeMSTNTNRGAITTSTQGSTLADILERVLDKGIVIAGDISISVGSTELLN 238 ATCC 29133_gvpJIRIRLLISSVDKAKEIGINWWESDPYLNSQTRTLLATNQQLQERLAS LETELQSLKALNPINHQNAGD*Nostoc sp. PCC MTTTPIHPTRPQTNSNRVIPTSTQGSTLADILERVLDKGIVIAGDISIS 2397120_gvpJ IASTELIHIRIRLLISSVDKAREMGINWWENDPYLSSKSQRLVEENQQLQQRLESLETQLRLLTSAAKEETTLTANNPEDLQPMYEVNSQEG DNSQLEA* OctadecabacterMNDGKMEHSLNATNLADILERVLDKGIVIAGDVTISLVGVELLNIK 240 antarcticus 307_gvpJ1LRLLIASVDKAMEMGINWWAHDPFLTAGAQAPAVADPAMLERM DRLEAALATALASNQTTPMKGHK*Octadecabacter MTNKAQGGQDLALADLLDRALSTGVVIWGEATISLAGVDLVYVG 241antarcticus 307_gvpJ2 LKVLVASVDAAERMKAASLVDRPTDRGQQI* OctadecabacterMNNGKMEHSLDATNLADILERVLDKGIVIAGDVTISLVGVELLNIK 242 arcticus 238_gvpJ1LRLLIASVDKAMEMGINWWAHDPYLTAGAQAPVGVDPAMLERM DRLEAALAKALASNQTTPAEGQSS*Octadecabacter MTNETQGGQDLALADLLDRALSTGVVIWGEATISLAGVDLVYVG 243arcticus 238_gvpJ2 LKVLVASVDAAQRMKDASLVDRPTDGGQ* PelodictyonMPELKHAVNATGLADILERVLDKGIVIAGDIKIQIADIDLLTIKIRLL 244phaeoclathratiforme_ IASVDKAMEMGINWWQEDTYLSTKAKDKEQQLLRDDLQQRIEKL gvpJ1EALTKIT* Pelodictyon MQDEFYSKNKEITILDVLDRVLTKGVVITGDIVISVADIDLVYVGL 245phaeoclathratiforme_ RLLLSSVETMEKNKQNSIKM* gvpJ2 Phormidium tenueMATATQGSSLVDVIERVLDKGIVIAGDISVSVGSTELLSIRIRLIISSV 246 NIES-30_gvpJDKAREIGINWWESDPYLSSRTNELLEANQQLQSRLETLEAELKALR SAEPVS*Planktothrix agardhii MNSQQLPSNIQRGVPTSTQGSSLADILERVLDKGIVIAGDISVSVGS247 str. 7805_gvpJ TELLNIRIRLLIASVDKAREIGINWWESDPYLSSQTKVLTESNQQLLEQVKFLQEEVKALKALLPQENQPNPISDPHK* PlanktothrixMNSQQRPSNIQRGVPTSTQGSSLADILERVLDKGIVIAGDISVSVGS 248 rubescens_gvpJTELLNIRIRLLIASVDKAREIGINWWESDPYLSSQTKVLTESNQELLEQVKLLQEEVKALKALLPQENQPKEME* PsychromonasMANVQKSTDSSGLAEVVDRILEKGIVIDAFVKVSLVGIELLSIEARV 249 ingrahamii 37_gvpJ1VIASVETYLKYAEAIGLTASAATPA* PsychromonasMPMANVSINPELTAQECEKISLCDALDRIINKGVVIHGEITISVANV 250 ingrahamii 37_gvpJ2DLISLGVRLILSNVETREQSNTPKEEV* PsychromonasMATGKPQSMTHSVKSTTVADLLERILDKGIVVTGDIKIKLVDVELL 251 ingrahamii 37_gvpJ3TVELRLVICSVDKAVEMGMDWWNNNPAFAPQAPAQEGELSSIEK RLEKIEKALVK* RhodobacterMGYRSASQPEGLADVLERILDKGIVIAGDVSVSLVGIELLTIRLRLL 252 capsulatus SBIATVDKAREMGIDWWSHDPYLNGRLRPGEPAPETETETAALRDRL 1003_gvpJ1AQLEAQLSALGAQVGAAPALAEPALRGLAAAGSSALCAAPEASSA DVVQPVFRRYKEAP*Rhodobacter MDDRFSLRLFGPEEVFDAPSGGLADLLDGLLGHGIVLHGDLWLTV 253capsulatus SB ADVELVYVGLSAVLASPEALRSHE* 1003_gvpJ2 RhodobacterMSFQMQSPLQQDSLADVLERILDKGIVIAGDISISLVGIELLTIRLRL 254 sphaeroidesLVATVDKAREMGINWWESDPRLCITQAPASDGSAALLDRLERIET 2.4.1_gvpJ1 QIGQLAAAREG*Rhodobacter MTDSAPTLQFATAEEALQSSETRLVDVVDALLSQGIAIRGELWLTI 255sphaeroides ADVDLVFLGLDLLLANPDRLQCRVPDAA* 2.4.1_gvpJ2 Rhodococcus hoagiiMTRSGSGANYPQQYSQGLGGAGHEPANLGDILERVLDKGIVIAGD 256 103S_gvpJIRVNLLDIELLTIKLRLVIASLETAREVGIDWWEHDPWLSGNNRDLELENERLRARIEALESGERRVADVTDPHRAVQPAESPAAEVRDDD A* Serratia sp. ATCCMPVNKQYQDEQQQVSLCEALDRVLNKGVVIVADITISVANIDLIYL 257 39006_gvpJ1SLQALVSSVEAKNRLPGRE* Serratia sp. ATCCMSGNKKLTHSTDSTTVADLLERLLDKGVVISGDIRIRLVEVELLTL 258 39006_gvpJ2EIRLLICSVDKAVEMGLDWWSGNPAFDSRARVSSSAPAPELEERL QRLEARLEAAPSVIEETHL StellaMSGQRMEHSVQAVGLADILERVLDKGIVIAGDISISLVEVELLTIRL 259 vacuolata_ATCC-RLVVASVDRAMSMGINWWQSDPNLNSHARQLEEDNRLLRERLDR 43931_gvpJ1LEAALALPEMAGERLADAGQGGGAEQGVTHGR* StellaMSDPEPIIPRTSGDIALADLLDRALHKGLVLWGEATISVAGVDLVY 260 vacuolata_ATCC-LGLKVLVASTETADRMRAAAASQSADPKVRAG* 43931_gvpJ2 Thiocapsa rosea strainMMLAIGEHPDCPEEIQRVSLCEALDRILNKGAVVSGELTIAVANVD 261 DSM 235LLYLSLQLVITSVETAKREMLYVRH* Ga0242571_11_gvpJ1 Thiocapsa rosea strainMSVQRSTLTHSTNSTSVADLLERVLDKGIVIAGDIRIKLVDIELLTIQ 262 DSM 235LRLVICSVDKAREMGIDWWSDNAMFKGLSSQASAASLPGTAAAS Ga0242571_1 l_gvpJ2 GIEDRLARLESLLVKQSAAAETVL* Tolypothrix sp. PCCMADILERVLDKGIVIAGDISVSIASTELLHIRIRLLISSVDKAKELGIN 263 7601_gvpJWWENDPYLSSKSQRLVEENQQLQQRLESLEAQLRSLTAAKINNPELFPVNAEDNGQSDEENVPLPMNYQPND* TrichodesmiumMFIRVDFLLDKGVIVDAWVRLSLVVIELLTIEAKIVIASVEAYLKYS 264 erythraeum EAFCFNY*IMS101_gvpJ1 TrichodesmiumMAVEKVNSSSSLAEVIDRILDKGVVVDAWIRLSLVGIELLTIEARIV 265 erythraeumVAVETYLKYAEAVGLTTLAAAPGEAAA* IMS101_gvpJ2 TrichodesmiumMAVEKVNSSSLAEVIDRILDKGVVVDAWVRLSLVGIELLTIEARI 266 erythraeumVIASVETYLKYAEAVGLTTLAAEPAA* IMS101_gvpJ3 TrichodesmiumMKTSANIATSASGNGLADVLERVLDKGVVIAGDISVSIASTELLNI 267 erythraeumKIRLLISSVERAKEIGINWWESDPYFSSQNNSLVQANEKLLERVASL IMS101_gvpJ4 ESEIKALRSN*Trichodesmium MKTSANIAKSAGGDSLADVLERVLDKGIVIAGDISVSIASTELLNIK 268erythraeum IRLLISSVERAKEIGINWWESDPSLSSQNNSLVQVNQKLLERVASLE IMS101_gvpJ5SEIEALKYSQ* gvpK Anabaena-flos-MVCTPAENFNNSLTIASKPKNEAGLAPLLLTVLELVRQLMEAQVIR 269 aquae_gvpKRMEEDLLSEPDLERAADSLQKLEEQILHLCEMFEVDPADLNINLGEIGTLLPSSGSYYPGQPSSRPSVLELLDRLLNTGIVVDGEIDLGIAQID LIHAKLRLVLTSKPI*Bacillus- MQPVSQANGRIHLDPDQAEQGLAQLVMTVIELLRQIVERHAMRR 270megaterium_gvpK VEGGTLTDEQIENLGIALMNLEEKMDELKEVFGLDAEDLNIDLGPL GSLL*Ancylobacter MTAPCTAETLENALRGRIDIDPEKVEQGLVKLVLMLVETVRQVVE 271aquaticus strain RQAIRRVEGGTLTEEETERLGLALMRLEEKMAELRLHFGLEDGDL UV5_gvpKDLKLQLPLGEL* Aphanizomenon flos-MVYSPVENSNDFLNVIPVENSNEFLNTSPKKKSNSETGLAPLLLTV 272 aquae NIES-81_gvpKLELIRQLMEAQIIRRMEEDLLSESDLERTAESLQKLEEQILNLCQIFDIDPADLNINLGDFGSLLPASGSYYPGETGNRPSILELLDRLLNTGIVVDGEIDIGVAQLDLIHAKLRLVLTSKPI* AphanotheceMSADESNLSQVNLNPATSNSDAGLAPLLLTVTELIRQLMEAQVIRR 273 halophytica (strainMDGGLLNEEELDRAGDSLQRLEAEIIRLCEIFEIDPKDLNVDLGELG PCC 7418)_gvpKTLMPKNGGYYPGESSDDPSILELLDRILHKGVVIDGNLDLGIAQLS LIQARLHLVLTSQPINGK*Aquabacter spiritensis MTGFAGGPAVTETLESVLQGRVDIDPERVEQGLVKLVLMVVETLR 274strain DSM QVIERQAIRRVEAGALTDEEIERLGLTLLRLEEKMAELRVQFNLSE 9035_gvpKADLSLKLRLPLGEL* BradyrhizobiumMSASSHSEAPGLRLQLGDLDTALAAVFTDAAPNGSINLDPDKIEHD 275 oligotrophicumLARLVLTLIEFLRRLLELQAIRRMEANELSEDEEERVGLALMRAAA S58_gvpKQVSRLARELGVDPRELNLQLGPLGRLL* BurkholderiaMNAPHAAAVSDAAALAAALEQALAQQQAPPPRATQRFDVATAS 276 thailandensis sp.AGNGLAKLVLALMKLLHELLERQALRRIEAGSLNDDEIERLGLAL Bp5365 strainMRQAEEIERLAAQFGFTDADLNLDLGPLGRLF* MSMB43_gvpK Chlorobium luteolumMHEDKVQFQASSVEEALRQLEGMKQGKESRIEANPDNVESGLAR 277 DSM 273_gvpKLVLTLIELLRKLMEKQAMRRIDGGSLDEAQIDELGETLMKLEMKM DELKKTFNLTDSDLNLNLGPLGDLM*Dactylococcopsis MSEEESNLSRVDLNPASSNSDAGLAPLLLTVTELIRQLMEAQVIRR 278salina PCC MDAELLTEAELDRAGESLQRLEEEILRLCEIFDVDPADLNVHLGEL 8305_gvpKGTLLPKEGGYYPGETSDQPSILELLDRVLHTGVVIDGNLDLGIAQL NLIQAKLHLVLTSQPINN*Desulfobacterium MIKDPEAKDFKIESDSIDAFARVMHADTSSCSSSSVTAGQRQQRLK 279vacuolatum_DSM IDEENIKNGLAQLVMTLIKLLHELLERQAIRRIESGSLDDDQIERLG 3385_gvpKLTLMQQCEEIDRLRKLFDLEEEDLNLDLGPLGKLL* Desulfomonile tiedjeiMNPMNIAKVESDSLGDFAEIMQTDWISSLHSDKEEKRLNLNQDSV 280 DSM 6799_gvpKKNGLGQLVLTLVKLLHDLLERQAIRRMEAGTLTDTEIDRLGTTLMMQAQEIERLRSEFGLEEEDLNLDLGPLGKLL* DesulfotomaculumMYIDISEGSLKQGVLGLLLALVEIIKDALKIQALKRIEGDSLTEDEIE 281 acetoxidans_DSMRLGNALHELEEALVEIEMEHNLQNVVQNIREGLDNVVNEVVDTFN 771_gvpK PERWIAENEFN*Dolichospermum MLSTPADNFDESLTTVSKSKNEAGLAPLLLTVLELLRQLMEAQVIR 282circinale_gvpK RMEDNLLSESELERAADSIQKLEEQILHLCETFEVDPAELNINLGDFGTLLPQSGSYYPGETGSRPSVLELLDRLLNTGVVLDGEIDLGLAQL DLIHAKLRLVLTSKPI*Enhydrobacter MTKLLEAKTVDPDKAGDDLVKLVLALVETLRQLVERQAIRRVDS 283aerosaccus strain GVLNDDEVERLGLALLRLEEKMSELKAHFGFGDEELTLKLGSLGEATCC 27094_gvpK LARDV* IsosphaeraMSDSLFEVRSPSAAPPSPVNPGVADEWTAVLKDWDTLTAQLRQA 284 pallida_ATCC-TAPPNAENSARSHATTGRIDLDPEQVGDGLAKLVLTLLELIRQLLE 43644_gvpKRQAIRRLDAGSLDHEQTERLGLTLMRLAQRMEELKTHFGLQGEDL NLDLGPLGKLL*Legionella drancourtii MNDKREEDNALPQRINLQPDDVKNGLGKLVLILIQLIHELLERQAI285 LLAP12_gvpK GRIEAGDLSDEQIDRLGITLMKQAEEIDKLREVFGLTQEDLNLDLG PLGKLL*Microcystis aeruginosa MTLACTPYDSDNQALLTRPESNSQAGLAPLLLTVVELVRQLLEAQI286 NIES-843_gvpK IRRMEKGVLSESDLDRAAESIQKLQEQILYLCEIFEVEPEELNVHLGEFGTLLPEAGSYYPGEEGIKPSVLELVDRLLNTGVVVEGNVDLGL AQLDLIHLKLRLVLTSQPV*Nostoc punctiforme MQAISKSKGSDSGLAPLLLTVVELIRQLMEAQVIRRMDAGTLNDS 287ATCC 29133_gvpK ELDRAAESLQKLEQQVVQLCEIFDIDPADLNINLGEMGNLLPQSGGYYPGETSSQPSILELLDRLLNTGVVVEGDLDLGLAQLSLVHAKLRL VLTSKPL* Nostoc sp. PCCMVCTPVEKSPNLLPTTSKANSKAGLAPLLLTVVELIRQLMEAQVIR 288 7120_gvpKRMEQDCLSESELEQASESLQKLEEQVLNLCHIFEIEPADLNINLGDVGTLLPSPGSYYPGEIGNKPSVLELLDRLLNTGIVVDGEIDLGLAQLN LIHAKLRLVLTSRPL*Octadecabacter MKTTSDSQFDSMKKILTDSSKEDSASCDPTDLLPNKSLPPSLSTSPE 289antarcticus 307_gvpK TAADDLVKLVLAVIDTVRQVMEKQAIRRVESGALAEAEIERLGLTLMRLEARMVELKSHFGLSNEDLNLHFGTVQDLKDILNDEE* OctadecabacterMKTQNDTQFDSMKKILTDSGGGDPNPNGSPDQTQHASLPSNLSTD 290 arcticus 238_gvpKPETAADDLVKLVLAVIDTVRQVMERQAIRRVDSGALADEEIERLGLTLMRLEERMADLKSHFGLSNEDLNLNFGTVQDLKDILNDEE* PelodictyonMDSDKILYYAGSADEIIEELEKLKPGIQGRINATPDNVESGLAKLVL 291 phaeoclathratiforme_TLIELIRKLIEKQAMRRIDGNSLSESQIEELGETLMKLEKKMEELKG gvpKIFNLTDKDLNLNLGPLGDLM* Phormidium tenueMTSENAEPDLSTTLALQPPAKTDAGLAPLLLTVIELVRQLMEAQVI 292 NIES-30_gvpKRRMESGDLDDNDLERAADSLRKLEEQVVSMCEIFDVDPADLNIDLGEIGTLLPKEGNYYPGQKNQNPTILELLDRLLDTGVVVEGDVDLG MAQLNLIHAKLRLVLTSKPI*Planktothrix agardhii MSSSEPSIETIITPKSSRKDAGLAPLVLTLVELIRQLMEAQVIRRMEG293 str. 7805_gvpK NTLSEEELDRAAQSLQQLEIQVLKLCEIFEIDPTDLNIELSEFGTLLPKSGSYYPGENTQNPSILELLDRLMNTGIVVEGSVDLGLAQLNLIHA KLRLVLTSKPL* PsychromonasMPFEHFKSNNQADVNSDTKPAASVGGLNLESDDLKNGLGRLVLT 294 ingrahamii 37_gvpKLVKLLHELLERQALRRMDAGSLQDDEIERLGLAFMKQAEEIDRLR KEFGLEVEDLNLDLGPLGRLL*Rhodobacter MSAAMHLELGDVDAVLSQAARSLAAGGRLTLDPERVEQDLARLV 295capsulatus SB LGIVELLRKLMELQAIRRMEAGSLTPEQEETLGLTLMRAEAALHE 1003_gvpKVAAKFGLQPADLILDLGPLGRSV* RhodobacterMTYPFPPLLLRDDRLPPTEAPVTAPRIALDPDRLEHDLARILLGLME 296 sphaeroidesMLRQIMELQAIRRMEAGSLSESQQEQLGTTLMRAEAAIHEMAARF 2.4.1_gvpKGLTPADLSLDLGPLGRTI* Rhodococcus hoagiiMRRRIDSDPESVERGLVALVLTLVELLRQLMERQALRRVDAGDLS 297 103S_gvpKDDQIERIGTTLMLLEEKMEELREHFGLEPEDLNIDLGPLGPLLAED* Serratia sp. ATCCMTTNQLSHHSPVFGPTSPAIQRPITEANRHKIDIDGERVRDGLAQL 298 39006_gvpKVLTLVKLLHELLERQAIRRMDSGSLSDEEVERLGLALMRQAEELT HLCDVFGFKDDDLNLDLGPLGRLL*Stella MTGFLNGPADVETLETALRGRVDIDPERVEQGLVKLVLMVVETLR 299 vacuolata_ATCC-QVIERQAIRRVESGSLTDDEVERLGLTLMRLEEKMDQLRRQFDLG 43931_gvpKEEDLSMRLRLPLQEL* Thiocapsa rosea strainMSDTRTGTAPSSAASAAPDTSTLQRANLLADLLETKVAAAGRRIDI 300 DSM 235DPERVQRGLGQLVLTVVKLLHVLLERQAIRRVDGGDLDEDEIEQL Ga0242571_11_gvpKGLALMRQSEEIERLRRLLGLEEQDLNLDLGPLGKLF* Tolypothrix sp. PCCMAMVCTPSENSNDLLATNSKANNQAGLVPLLLTVVELIRQLMEA 301 7601_gvpKQVIRRMEEECLSESDLERAAESLQKLEEQVLNLCQIFEIDPADLNIHLGELGSLLPAAGSYYPGETGNTPSVLELLDRLLNTGVVVDGELDL GVAQLNLIHAKLRLVLTSKPLNTK*Trichodesmium MSLENSPEESLIVPIDKSKSNPEAGLAPLLLTVIELLRELMQAQVIR 302erythraeum RMDAGILSDEQLERAAEGLRQLEEQVIKLCKVFDIPTEDLNLDLGE IMS101_gvpKIGTLLPKSGEYYPGEKSENPSVLELLDRILNTGVVLDGTVDLGLAE LDLIHARLRLVLTA* gvpLAncylobacter MLYLYAILESPPPQKPLPPGIGGAAPLFVESHALVCAASEAADAAI 303aquaticus strain AREPSQIWRHQEVVAALMEGRPVLPLRFGTVVEDSAACLRLLARH UV5_gvpLHAELSAQLDRVRHCVEFALRVAGLSELADPGLDPNATPAGLGPGASHLRTLVRRERGWPVSSAAFPHDTLTAHAASRLLWARSPSQPDLRASFLVQRRSASAFLDDVNALQRLRPDLGITVTGPWPPYSFSDPDLS GGRE* AphanotheceMLYTYCFLFSPEKTLSLPQGFKGDLQMIEKGAIAAVVEPNLPKAEL 304 halophytica (strainEEDDQKLVQAVVHHDWVICELFRGLTVLPLRFGTYFRGEADLRSH PCC 7418)_gvpLLAAYEESYQQKLTALTGKVEVTLKLTPIPFSEEGSSSTAKGKAYLQAKKQRYQQQSNYQTQQQEALEKLQEEIKKTYPQLIHDEPKENTERFYLLIDSHSFSVFGEKMEQWKQFLSSWSIEISDPLPPYHFL* Aquabacter spiritensisMLYLYAVLEAPPPARSLPPGIGGGAPHFIEAFELVCAASETPNRSV 305 strain DSMAPEPAEVWRHQQVVEALIDRAPALPLRFGTLVEDASACRRLLTRH 9035_gvpLRDALGAQLGRVRHCVEFALRVSGLPEEVAPDPGIGGGPGTSYLRTLARREAGWPPSTAVFPHDGLAAHAAERLLWARSTSQPDLRASFLVRKPNVAAFLADVSALQRVRPDLGITCTGPWPPYSFSDPDLSGVSP* Bacillus-MGELLYLYGLIPTKEAAAIEPFPSYKGFDGEHSLYPIAFDQVTAVV 306 megaterium_gvpLSKLDADTYSEKVIQEKMEQDMSWLQEKAFHHHETVAALYEEFTIIPLKFCTIYKGEESLQAAIEINKEKIENSLTLLQGNEEWNVKIYCDDTELKKGISETNESVKAKKQEISHLSPGRQFFEKKKIDQLIEKELELHKNKVCEEIHDKLKELSLYDSVKKNWSKDVTGAAEQMAWNSVFLLPSLQITKFVNEIEELQQRLENKGWKFEVTGPWPPYHFSSFA* BurkholderiaMNDALYLFCFARAEPLAPAWAKRAPGEPRLQLLHEGNLAAVLCD 307 thailandensis sp.VSRSEFAGADAERRLADPAWIAGRVAVHAAAIEWTMRYSPVIPAQ Bp5365 strainFGTLFSGAGRVIALMESCHAHIGRVLDHVEGKTEWAVKGWLDRQ MSMB43_gvpLAAADSQAALLRADEPESAARTAGARYLRERQLQARAGQNLRDWLEQSVPPISARLQRHAVEMCSRPCRASDSEHEIVANWAFLVRNRDVPAFRRQAEAIDAEFATWGLHFDFSGPWPPYSFCAPLTEETTWSG* Chlorobium luteolumMPCRLTVTWKSLRTAGLLPTAKGIQGRTERMAQNILYVYCIVRQL 308 DSM 273_gvpLPGADIVARYPDLVFIEAGSAYVAAKYVSPLEYSDASMKLKLADEEWLDRNAREHLSVNVMIMAQQTIIPFNFGTIFKSRESLSGFLGDYGRKLDESFDALEGREEWAVKAYCNESFLLKNLHLESPAIAAIEQEIQAASPGKAYLLKKKKEAMSASALEGVHQGHAKAVWGELAALSKEHVLNRLIPEDVSGVDGRMIVNGVFLIANTDVGAFIRTTEDLGERYRD AGVFLDVTGPWPPYDFVDIPY*Dactylococcopsis MLYTYCLIASSPSALSLPSGFRGELQLIKQGAIAAIVEAELPLEELEE 309salina PCC 8305_gvpL NDQKLIQAVIHHDAVICEIFQQIPLLPLRFGTYFPTEKDLLEHLDFKAEKYQKKLQEIQDKVELTLKLTPLPFSTENASPMEKQGKNYLKAKKQRYQEQTNYQSQQQAELNQLQTQINQDYPQFIHGEPKENIERFYLLIKERDRSVFSEQLEQWKKDFPTWTIEVSDPLPPYHFIE* DesulfobacteriumMEKKKAVYLYCVTRANKFNAPGITGIDANTPVCFEHLENFVAVY 310 vacuolatum-DSMNIIPLNTFVGTSAEENMKNIDWIGPRAMRHENVIERMMQESSVYPA 3385_gvpLRFATLFSSMENLRETLHLKSGLISRFLNQTQHKCEYSLKGFINRKQLLEFLIKTKFKQEKKQLDGLSPGKKYFAQHQFNKKVETGINQWIKRRCGIFLDHLTKRNPEVSPRELFTEKTEKNNLEMMFNLAFLIHNDSKSAFLQEISQAEKEFSQTGISLVVSGPWAPYSFCKTTRGEGL* Desulfomonile tiedjeiMSNVLYLFCLARTGLVDHIEGTGITGTEDLILKNFSGVTAVTCEVP 311 DSM 6799_gvpLEDDFSGESAEIKLQDLAWVGPRAVRHDRIIEEIMQYSPVFPAPFGSLFSSEKRLGTLIESNIDAIREFLDHTADKQEWSVKGLVCKSKAVDEIFTGKLKILSETLSSSPAGMRYFKERQMRSEAEKELSGKVKAACTVVGEKLLACSNNFRQRKNISFGKAEGDKQLVVNWAFLVDHSRISYFLDQVEHANSNYQAGGLAFECSGPWPPYSFCPSLHMEPTR* DesulfotomaculumMNLIDDCKAKYIYCIGENPGNWPSEVMGVEGSLVYHVVYRDIAA 312 acetoxidans-DSMVVHDCAEQPYNSDDNNKVIDWVLGHQLVVDKACSCYSSVLPFTF 771_gvpLNSIVKGKEDLSSHEILVNWLEDNYDNFKLKLGKIKGKKEYSVQLFLDKQVSLSLLQSESDILELQVELLGSAKGKAYFVQEKINKKIGELMANRADSYCRQFYHEISSVVSECKLCKLKQAGRNEIMIINLVCLAGDNEVEVLGDVLEKIKSNDIAIKIKFSGPWPAYSFV* EnhydrobacterMLYVYGIADNAFEVLRGAGLLNSDVFAVPAGCLAAAASKLAQGG 313 aerosaccus strainIETTPQGVWRHEQVLRQLMQDHAVLPLRFGTICRDRETLTDRLME ATCC 27094_gvpLASDDLVRGLGRVRGKVEIALRIVDEREHEAHPVPSETPTVDAIGGGRGTAYLRARRRHHAAEMGREARAERVGKMLSAYIDVGAEDLVCSVAPEGDHAVSVSCLLGRDQLATLQAALERFQSDHPAIGLSWTGP WTPYSFVAPSLFGVGLP*Legionella drancourtii MNKALYLFCLTPASDLPMMEGELLPNFSPLFIHPFQTFNAILSWVP314 LLAP12_gvpL AKEYQEQSTDSNLINTEEFMQRVFFHELVVEKIMRDEAVFPIGFGTLFSSIASLEEQILTHQTLISSCLANLNQKDEYAVRVYLNQDKALESLLSVMLQERESSWASSSPGVQYLKKQQLHNEIQRNLNQHLGGMLDEVLSMFQRHATDFKSRENTAQSSDIHGTSILHWAFLIPRVVSSIFKEQVDLMNAKYNPFGLHFVLTGPWPAYSFCTLQSVEAP* Lyngbya confervoidesMRWHRSEAVISYCDLSMIYLYALCPNSTETNNLPEGIGTAQVEVLT 315 BDU141951_gvpLVGTLGAVIERDVDIAQIQKDDAQLMAAVLAHDRILSHLFTYSPLLPLRFGTQFSNSEAVTTFLKTQGETYRQKLSHLQDRAEYLVKLIPQPLDLPAIASDLKGREYFLAKKQRLQDHTAALNQQADELQTFLTDLATQDIPLVRSAPQDHEERLHVLLSRDTDTTEQVIMTWQEQLPNWQVV CSEPLPPYHFAA*Octadecabacter MKRLYVYGIVGATSFDDPLPNGHDEASVFALVSGDIAVAVSFVER 316antarcticus 307_gvpL SAVEASAANVWLHDNVLSALMTRYAVLPMRFGTIAVGATQLLEGIVKRQKQLMKDLMRLNENVEIALHISGKNWEKVNQKVTKKNTDQAITQGTAYLLGRQQSLYGSDKTQLLVQNVRRAIRSGLDPLMKDVIWPIDKPQALPFKASCLINRNDVASFVQIVNDIAAQNLDARVTCTGP WAPYSFVGKSGVEGET*Octadecabacter MTKLYVYGIVGATHFDVKLPNGHDEAPVFAIVSGDLAVAVSSLER 317arcticus 238_gvpL SAVEASAANVWLHENVLSALMEGHAVLPMRFGTIATGAAQLLGDIVKRRGQLMKDLTRLDGKVEIALRISGKNREKVEQRIAGQIVDTNVTQGVAYLQEKQQNLYGSFYTQSSVQCARRAIRSQLDPFIVEAIWPTDEPQMLPFRASCLIKKGDIARFVQTVDDVVVKVSDIRVTCTGPWA PYSFVGQSGSEAET*Pelodictyon MVAIQERLIYIFCVTSEPPLLQQYQLQKGICVVDVDGLFVTTMDVT 318phaeoclathratiforme_ DNDFAENQLQSNLSDVVWLDTKVREHLDVITSIMQHVKSLIPFNF gvpL1GTLYKSESSLMQFIIKYAEEFKKNLVYLEEKEEWAVKLYCNKNKIVENITHLSKKVSDINALIQNSSIGKAYILGKKKNEIIENEIINIYNTYSKKIFTKFSILSEEFRFNPIPNNETLEKEDDMILNVVLLLNKANVESFIETSDQLIIQHQNIGLNIEITGPWPCYSFINISH* PelodictyonMPLIIYAIFDSINYIDSFSSYVDAISLKSKIKLEIISTSTLSAIVSRTTDE 319phaeoclathratiforme_ KKQACQNDVMIYATIIGDIAAKYSILPMRYGSIVSSPFDVTELLKNgvpL2 HNETFVTIIKKITDKEEYSLRILYSHQDKEKNNIEDLFDLPQNVPDILHGNTDSKKYLLNKYIKHLSEEKRLQYIDKIQSIVACNLQKITDLIVYNKQTTTGFIVDAVFMIERSKKSELLDLVIQMQTLFSEHNVVLSGPW PPYNFSNINIG* PsychromonasMKNSNHSGLDPNQALYLYCFVHADSIQSVTSQAIEKDSPVFIYQW 320 ingrahamii 37_QDIAAVLSHVPTSYFTGYDDEEPEQTIARILPRTQLHEQVIEEVMRQ gvpL1SPVFPAQFGTLFSSQESLEQEISQQYLAITHTLKEVSGSVEWAVKGVLDRGVAEKALYSQQLTEQQNSLSSSPGMRHLQEQRLRRETQSKLNSWLHQLYTDIATPLSELSGDFFQRKIPSSIEEGKEVILNWAFLVPESAGDDFHAQIDKLNQRLNSFGLVIQCSGPWPPYSFCNQSS* PsychromonasMKNSNHSGLDPNQALYLYCFVHADSIQSVTSQAIEKDSPVFIYQW 321 ingrahamii 37_QDIAAVLSHVPTSYFTGYDDEEPEQTIARILPRTQLHEQVIEEVMRQ gvpL2SPVFPAQFGTLFSSQESLEQEISQQYLAITHTLKEVSGSVEWAVKGVLDRGVAEKALYSQQLTEQQNSLSSSPGMRHLQEQRLRRETQSKLNSWLHQLYTDIATPLSELSGDFFQRKIPSSIEEGKEVILNWAFLVPESAGDDFHAQIDKLNQRLNSFGLVIQCSGPWPPYSFCNQSS* Serratia sp. ATCCMTMNTEAQTEQAIYLYGLTLPDLAAPPILGVDNQHPINTHQCAGL 322 39006_gvpLNAVISPVALSDFTGEKGEDNVQNVTWLTPRICRHAQIIDSLMAQGPVYPLPFGTLFSSQNALEQEMKSRATDVFVSLRRITGCQEWALEATLDRKQAVDVLFTEGLDSGRFCLPEAIGRRHLEEQKLRRRLTTELSDWLAHALTAMQNELHPLVRDFRSRRLLDDKILHWAYLLPVEDVAAFQQQVADIVERYEAYGFSFRVTGPWAAYSFCQPDES* Stella vacuolata-MLYLYAVLEALPAARTLPAGIGGGELLFVEAFELVCAASETPERAI 323 ATCC-43931_gvpLAPEPTQVWRHQQVVEALIDCAAALPLRFGTLVEDAVACRRLLTRHREALCAQLDRVRHCVEFALRVSGLREEVGSDHVIGGGPGVSYMRALARREASWPPSTGTFPHDGLAAHAADRLLWSRSASQPDLRASFLVLKPNVAAFLADVSALQRMRPDLGITCTGPWPPYSFSDPDLSGMS P* Thiocapsa rosea strainMDAFYCFCFAPACLASDLRFDDCGWEDPIEIRRLAGLDVILSRVPL 324 DSM 235 Ga0242571-GRFAGAEAEQRLADLEWLVPRAQAHDRVITRTMERSTVFPLTFAT 11_gvpLLFSSLPALALEVAARRRALLDFFERMAGREEWAVKVSMDRERVIATRMQSLYPEGGDVPAGGRGYLLKQRRRGEAEQAIGPWLKGQIGCLDEALRPSCETLLIRPLRDEMVASRACLVARDLGPSLSEAIERSREA FADQGLDLHCSGPWPLYSFCGTP*Trichodesmium MSYYVYGFLYLPESCLALPKGMEKEVELVPYQNIAAVVEANVSIE 325erythraeum AIQETEEKLLEAILAHDRVVREIFQQVSMLPLRFGNAFALRENIIND IMS101_gvpLLQNNQQQYLNILTKLQQQAEYTITFTPVSYPSTLEVSKVRGKAYLLAKKQQFEQQQAFQTKQRQQWENIRQLIFKNYPKAVFRDSTESKIKQVHLLANRDARVITTEELSTWQTECSYWQITLSEQLPPYHFV* gvpN Anabaena-flos-MTTTKVNHKRAVLRLRPGQFVVTPAIERVAIRALRYLKSGFPVHL 326 aquae_gvpNRGPAGTGKTTLAMHLANCLDRPVMLLFGDDQFKSSDLIGSESGYTHKKVLDNYIHSVVKLEDEFKQNWVDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKILSLPPSSNQPEYLSVNPQFRVIFTSNPEEYAGVHSTQDALMDRLVTISMPEPDEITQTEILIQKTNIDRESANFIVRLVKSFRLATGAEKTSGLRSCLMIAKVCADNNIPVTTESLDFPDIAIDILFNRSHLSMSESTNIFLELLDKFSAEELEILNNRVTGDNDFLIDNSQ FVSQQLAGQPN*Ancylobacter MTSEAASKDPISLLSGFGAGAASSGPKAGGRSTPSALTPRPRTGFV 327aquaticus strain EAEQVRDLTRRGLGFLNAGYPLHFRGPAGTGKTTLALHVAAQLG UV5_gvpNRPVIIITGDNELGTADLVGSQRGYHYRKVVDQFIHNVTKLEETANQHWTDHRLTTACREGFTLVYDEFTRSRPETHNVLLGVFEERMLFLPAQAREECYIKVHPEFRAIFTSNPQEYAGVHASQDALADRLATIDVDYPDRAMELAVASARTGMPEASAARIIDLVRAFRASGDYQQTPTMRAGLMIARVAAQEGFEVSVDDPRFVQLCSDALESRIFSGQRAEEVAREQRRAALHALIDTHCPSAAKPRARRAGGAVRASIEGAQS* Aphanizomenon flos-MTKTNHKRAVLRVRPGQFVVTPAIEQVAIRALLYLKSGFPIHLRGP 328 aquae NIES-81_gvpNAGTGKTTLALHLAHCLDRPVMLLFGDDEFKSSDLIGSESGYTHKKLLDNYIHSVVKVEDEFKQNWVDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKILSLPPSSNQPEYLSVSPQFRAIFTSNPEEYCGVHSTQDALMDRLVTINMPEPDEITQTEILIQKTNIQKESAHLIVRLVKSFRIATGAEKTSGLRSCLMIAKVCADNNLVAEPENSFFQEIAMEILSNRTHLSVNESTDIFLDVISQFSNKEIEILNDAELGSLPTMDTLANTDLGNDVPLEKEASDYVIQQKNNEFKGFQKPSTKVLN* AphanotheceMTTVLHARPKGFVSTPTIDRISRRAWRYLQSGFSIHLRGPAGTGKT 329 halophytica (strainTLAMHLADLLNRPIMLLYGDDEFKSTDLIGSNTGYTRKKVVDNYI PCC 7418)_gvpNHSVVKEEDELRQQWVDSRLTMACREGFTLVYDEFNRSPPEVNNVLLSALEEKLLVLPPDSHRSEYVRVSPNFRAIFTSNPEEYWGVHGTQDALLDRVVTINVPEPDLETQREIIVQKVGINADDGDMIVNFVRNFRDRAEMENSSGLRSCLMIAQVCHQHEIPVQTSNEDFQDICYDILTSRCPLSTQESISLLEQLFREYELELVVEDEDEDVPSVIVEGETEDLSSDE KPHLRLSHPFGNTEND*Aquabacter spiritensis MSTEPAPLVSPSQDVETTPQRPARPEPAEALAVGYRLSARPASPAT330 strain DSM LTPRPRADFVETDQVKDLTRRGLGFLRAGYPLHFRGPAGTGKTTL 9035_gvpNALHVAAQLGRPVIVITGDNELGTADLVGSQRGYHYRKVVDQFIHNVTKLEETANQRWTDHRLTTACREGYTLVYDEFTRSRPETHNVLLGVFEEKILFLPAQAREECYIRVHPDFRAIFTSNPQEYAGVHASQDALADRLATIDVDYPDRGMELAVASARTGLGETEAARIIDLVRAFRASGDYQQTPTMRASLMIARVAAQEGLRVSIDDPGFVQLCMDALESRMFSGARLEAATRETSRAALLALLAVHCPSEAPIVRVTAARRAKKA DAS* Arthrospira platensisMTTVLRAVPKGFVNTPAIERITVRALRYLQSGFSVHLRGPAGTGKT 331 NIES-39_gvpNTLALHLADLLNRPIMLIFGDDELKSSDMIGNQTGYTRKKVVDNFIHSVVKLEDSLKQNWIDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKLLVLPPNNSRSEYIRVNPHFRAIFTSNPEEYCGVYSTQDALLDRLITMNMPEPDEATQQEILIQKVAVTPEEAQTIVTLVQQFREATHAIAPSKIQTVARQQTNADKASGLRPSLMLARICQEHNIPIVPIDPDFQEVCRDILLSRAIGDITELESRLHQIFDHLSGLENDQIIALPPREELTTSSVPNNLSDTEQKIYTYIKDSDGARVSEIEIALGLNRVQTTDALRS LLRKSYLTQQDNRLFVVYEGD*Bacillus- MTVLTDKRKKGSGAFIQDDETKEVLSRALSYLKSGYSIHFTGPAG 332megaterium_gvpN GGKTSLARALAKKRKRPVMLMHGNHELNNKDLIGDFTGYTSKKVIDQYVRSVYKKDEQVSENWQDGRLLEAVKNGYTLIYDEFTRSKPATNNIFLSILEEGVLPLYGVKMTDPFVRVHPDFRVIFTSNPAEYAGVYDTQDALLDRLITMFIDYKDIDRETAILTEKTDVEEDEARTIVTLVANVRNRSGDENSSGLSLRASLMIATLATQQDIPIDGSDEDFQTLCIDILHHPLTKCLDEENAKSKAEKIILEECKNIDTEEK* BradyrhizobiumMLRSDRAAIAGGQRGSRAQGDAVARNDAAAGSRAAIAQISPRPD 333 oligotrophicumADNAALSPAPRTDLFENPQLASMAARALTYLNAGIPVHLRGPAGT S58_gvpNGKTTMAMQLAARLGRPVVLLTGDDGLTAAHLVGREIGTKSRQVVDRYVHSVRRVETETSSMWCDAVLAQAVVEGLTFVYDEFTRSPPQANNPLLSVVEERILIFPAGSRKERLVHAHPEFRAILTSNPEEYAGVSRPQDALLDRLITFDLDDYDRETEIGIVSNRTGLAYAEAGVIVDLVRGVRRWPKAHHPPSMRSAIMIARIVARELITPSVDDPRFVRLCLDVLAAKAKPTDRDDRDRFAATLLRLMNNHCPAGAIDGG* BurkholderiaMEASAEFVQTPAVRNLTERALTYLGAGYGVHLAGPSGTGKTTLA 334 thailandensis sp.FHIAAQLGRQVVLMHGDDELGSADLVGRGAGYRRSRVVDNFIHS Bp5365 strainVVKTEEEMTTTWIDNRLTTACQHGLTLIYDEFNRSRPEANNALLP MSMB43_gvpNVLSEGILNLPNRMTGAGYLTVHPGFRAIFTSNPEEYVGVHKTQNALMGRLITIQVGHYDRETEVEIVRARSGIARADAERIVDLTRRLRDADDNGHHPSIRAAIALARALSYCGGEATPDNAGYVWACRDILGVDL EQDARTRSQAGRRTKARR*Chlorobium luteolum MRAAVNDNEMNTVLAPRPMANFVETEYIRDITERGLTYLKAGFPV 335DSM 273_gvpN HFRGPSGTGKTTVAMHLAGKIGRPVVVIHGDSEYKTSDLIGSEQGYKFRRLNDNFIHSVHKYEEDMSKQWVNNRLSIAIKKGFTLVYDEFTRSRPEANNILLPILQEKMLSTSASNEEDYYMKVHPEFRAIFTSNPEEYAGVNRTQDALRDRMVTMDLDYFDYETELRVTHAKSELTLEDSEKIVQVVRGLRESGKTEFDPTVRGSIMIARTLHIMQVRPEKTNDAVRKVFQDILTSETSRVGSKTNQEKVRAIVNDLIEAYL* DactylococcopsisMTTVLHARPKGFVSTPTIDRISGRAWRYLQSGFSIHLRGPAGTGKT 336 salina PCCTLAMHLADLLNRPIMLLYGDDEFKSTDLIGSNTGYTRKKVVDNYI 8305_gvpNHSVVKEEDELRQQWVDSRLTMACREGFTLVYDEFNRSPPEVNNVLLSALEEKLLVLPPDSNRSEYVRVSPNFRAIFTSNPEEYWGVHGTQDALLDRVVTINVPEPDLETQQEIITQKVGINANDGEKIVNFVRQFRDRAAVKNSSGLRSCLMIAQVCHQHEIPVQTSDEGFRDICYDILSSR DesulfobacteriumMSASMSSMKETRQRMSAPEQDNVVPEAGSDFVETPYVKDITDRA 337 vacuolatum_DSMLAYLHVGYPVHFSGPAGTGKTTLAFHVAAKLKRTVMLIHGDDEF 3385_gvpNGSSDLIGKDSGYRKAKVVDNYIHSVVKTEESMNTVWADNRLTIACQQGCTLVYDEFTRSRPEANNAFLSVLEEKILNIPSLRDIDQGYLQVHPEFRAIFTSNPEEYAGVHKTQDAMMDRLITITLDHFDRDTEVQVTMSKSDLPQKDAEKIVDIVRKLRKTGVNNHRPTIRACIAIGKILKHMGGGASKDNFVFKQICRDVLNVDTTKVTRDGEPLLPRKIDELINSL* Desulfomonile tiedjeiMNGAELRIASIETEVITANNENIVPEAGDRFVNTPHVEELTARAMA 338 DSM 6799_gvpNYLEVGYSVHFSGVAGTGKTTLAFHAAAKLGRPVILVHGDHEFGSSDLIGRDAGYKKSRLVDNFIHSVVKTEEEMRSLWVDNRLTTACRDGYTLIYDEFTRSRPEANNVLLSILEEKILNLPSLRRTGEGYLEVHPSFRAIFTSNPEEYAGVHKTQDALMDRIITINVDHYDRETEIEITRAKSGVCKQDATVIVDIIRELRLLGVNNHRPTIRAAIAIARVLAHTGEHADQHNSVFQWLCKDVLSTDTVKVSRGGSPLMAKKVEEVIRKVCGRT GGKRSGKPVGSKEETSE*Desulfotomaculum MQLNGLDKNSIINPVVLSDFVVTDYISNVVDRALAYIKAGFAIHLR 339acetoxidans_DSM GRSGTGKTSIAMYISSKLNRPTLVIHGDEEFRTSDLIGGRYGYRIRK 771_gvpNTIDNFVQSVVKVEEDLVERWVDSRLTTACKNGYTLVYDEFTRSRPEANNILLSVLQERLLDISVARGAEEGYVKVHPDFTAIFTSNPEDYAGVYGSQDALRDRMVTLDLDNYDKETEISIIKSKSKLSREDSERVVNILRDLRELGDCEYGPTIRGGIMIAKTLQVLGAPVDKNNEMFRQICEEVLASETSRAGNLQALRKVRKVINELFNKYA* DolichospermumMSITKVNHKRAVLRLRPGQFVVTPAIERVVIRALRYLRSGFPIHLR 340 circinale_gvpNGPAGTGKTTLGMHLANCLDRPVMLLFGDDQFKSSDLIGSESGYTHKKLLDNYIHSVVKVEDEFKQNWVDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKILSLPPSSNQPEYLSVNPQFRVIFTSNPEEYCGVHSTQDALMDRLVTINMPEPDEITQTEILIQKTNIGRESANLIVRLVKSFRLATGAEKTSGLRSCLMIAKICADHDIPASTEDLDFREIAIDILFNRAQLSISESTDIFMGLLEQFSAEEIKVLNDTHFPTDELLINNSQFITQELVTQPNTELATDIPQELRKTEQN* EnhydrobacterMSMDQAEEIGVVTTIEPRPRADFVRTQSVEATARRALGYLNAGFS 341 aerosaccus strainVHFRGPAGTGKTTLALHLAALLGRPMVMITGDEEMLTSTLVGTQ ATCC 27094_gvpNHGYHFRRVVDRFIHTVTKTEETADKRWADHRLTTACREGYTLIYDEFTRSRPEANNVLLSVLEEGLLVLPAQNQNEPYIKVHPNFRVIFTSNPQEYAGVHDAQDALGDRIVTIDMGHADRELELAIAAARSGLPPTQVAPIVDMVREFRETGEYDQTPTLRTSIMICRMMSQERLAPTIEDQQFVQICMDILGGKSLPGGKGDNKRAQQQKMLLSLIEHHCPARSFTS VGEV* IsosphaeraMDYESTALQLKPRPDFVATPWVRELADRALGYLTAGYPVHFSGP 342 pallida_ATCC-AGTGKTTLAMHLAALVNRPVVLLHGDDEFGSSDLVGDHLGFRST 43644_gvpNKVVDNFIHSVVKTEQSVSKTWVDHRLTTACRHGFTLIYDEFNRSRPEANNILLTILEERLLELPPIAGGRDGSGPLRVHPEFRAIFTSNPEEYAGVHKTQDALLDRMITISMGGHDEATETEITAAKSGLSRDEAARIVELARAVRALKPLRHPPTIRSCLMIAKVAALRKVPIDPNDALFLAICRDVLRIDALPVDDPEATFAELIRRVFAPTPAVAPPRVPTTGFAANRVVPIPRRPLAASASPPPGANGHAHLR* Legionella drancourtiiMMTQENNGSLTDSKNNDKLIRFVNNRSDNILLEASEEFTETPHIRGI 343 LLAP12_gvpNSERALAYLDIGYPIHLLGPAGTGKTTVALHIAAQLGRPVILIHGDDEFTGADLVGRGTGYHHSKLVDNFIHSVLKTEEEMTTMWTDNRLTTACEQGYTLIYDEFNRSRAEANNALLSVLSEGILNLPGRRERDGIGYVDVHSNFRAIFTSNSEEYVGIHKTQNALADRLIAIKMDYPDQQSEIQIIEKKSTLPRKDIEIIVNLARELRLKSEKRPSIRGCIAIARVLAYHNRHAHADDPIFQAVCQDIFGISKEFLKQLLHPMDSGLQKRSEKNQESI KKYKTKNQKL*Lyngbya confervoides MSTVLQARPRNFVSTPAVERIARRALRYLQSGYSVHLRGPAGTGK 344BDU141951_gvpN TTLALHLADLLSRPIMLVFGDDEFKTSDLIGNQSGYTRKKVVDNYIHSVVKVEDELRHNWVDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKLLVLPPSGHRPEYLRVNPHFRAIFTSNPEEYAGVHGTQDALLDRLITIHMPEPDELTQQQILIQKVGIEPADALMIVRLVKAFKSQMGNHSATSLRPSLMIANICHEHGVAMMTEDADFRDVCSDVLLSRVTNELSPATHTLWDLFNELTASADVLGPESNSTDVSPQPEADKPVETKGSKGKSTTKSKAKESAKASEEADEAGDDSASAPELDEIESSILTFLTARESASLSEIESELSLTRFKAVDALRSLVEAGYLQKQNGAG KPAIYGLVPEES*Microcystis aeruginosa MTVTETQTRRAVLSLRPGQFVVTPSIDQIATRALRYLNSGFSIHLCG345 NIES-843_gvpN PAGTGKTTLAMHLANCLARPVMLIFGDDDFTSSDLIGSQSGYTHKKLMDNYIHSVLKVEDELKHNWVDSRLTMACREGFTLVYDEFNRSRPEVNNVLLSALEEKILTLPPTSHQPDYLQVNSQFRAIFTSNPEEYCGVHATQDALMDRLVTINMPEPDQLTQTEILAQKTGIGREDALFIVNLVKTFRVKTATEKTSGLRSCLMIAKVCASHDIAANSADSDFRDICADVLLSRTNLSVDKSRAILWEILEDNPLESLSFLEEEEPSDAQVSTSEPSTGNQSLKAIQSLLRGNLPQRKD* Nostoc punctiformeMTTVLNASPQRFVNTPAVQRIAQRALRYLQSGFSIHLRGAAGVGK 346 ATCC 29133_gvpNTTLAMHLADLLNQPIILLFGDDEFKTSDLIGNQLGYTRKKVVDNFIHSVIKVEDEVRQHWVDARLTLACKEGFTLVYDEFNRSHPEVNNVLLSVLEERLLVLPTNQHRAEYIRVHPQFRAILTSNPQEYCGVHATQDALMDRVITIDMPTPDELSQQEIVVHKTGIDSEKAEVIVRIVRTFWSRSGSGQGGGLRSCLMIAKICHEHEISVNPGDPSFQDICADILLSRTNQPLIEATRLLEEVLSEFYHRINTQSQPSEIIPNNQNQIVLEQRVPYEHEVYNYLCNSPGRRFSELAVELGIDRSQIVAALKSLREQGVLVQMQ GNAESPSISQTVAFDSGHLINK*Nostoc sp. PCC MTLTANNKKRAVLRVRPGQFVVTPAIEQVAIRALRYLTSGFAIHLR 3477120_gvpN GPAGTGKTTLAMHLANCLDRPIMLIFGDDEFKSSDLIGSESGYTHKKLLDNYIHSVLKVEDEFKQNWVDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKILTLPPSSNQPEYLHVNPQFRAIFTSNPEEYCGVHSTQDALMDRLVTINMPEPDELTQTEILAQKTALNRADALLIVRLVKAFRSRTGGEKTSGLRSCLMIAKVCAEHNILVSPQSSDFREICADVLFNRTNWSASEAATIFLELLNHLDLQQIEEFKNSIIPEDTDAIAEGGFPTIIDSHFGTLDSEVLEQPGVEDSIPFEQEIYLYLQQYKSAALALVQQEFELSRTVATNALNSLEQKGLVSKNNHVYTIEEPNQS* OctadecabacterMNSNLRATNSGGPDISKTMMPEAREDFVQTESVKSISRRALAYINA 348 antarcticus 307_gvpNGYSVHFRGPAGTGKTTMAMHTAALLGRPVVLITGDEEMITSNLVGAESGYNYRKVTDNYIHTVSKIEESSDRSWNDHRLTTACREGYTLIYDEFTRSRAEANNVLLSVLEEGILVLPAQNRGEPFIKVHPNFRVIFTSNPQEYAGVHEAQDALSDRIVTIDIGEADRELEVSIASSRSGLEVAKTEPIVDMVRAFRDTGEYDQTPTLRACIVICRMVANEKLNTTIDDPFFVQICLDVLGSKSTFGGKEHDKRTQQRKLLLDNLKHYCPSKVSTK PSAKDDESKSTLIQVSSRGSL*Octadecabacter MMPEARKDFVQTDSVKSVSRRALAYINAGYSVHFRGPAGTGKTT 349arcticus 238_gvpN MAMHTAALLGRPVVMITGDEEMVTSNLVGAESGYNYRKVTDNYIHTVSKVEESSDRSWNDHRLTTACREGYTLIYDEFTRSRAEANNVLLSVLEEGILVLPAQNRGEPFIKVHPDFRVIFTSNPQEYAGVHDAQDALSDRIVTIDIGAADRELEVSIASSRSGLEVAKTAPIVDMVRAFRDTGEYDQTPTLRACIMICRMVANEKLNPTIDDSYFVQICLDVLGSKSMFGAKEQGKRTQQEKLLLDNLSHHCPSPPPSKPSAKEAEAKPRSIQA TSRGPA* PelodictyonMRRQGCDSEMNTVLEPKPMPNFVETDYIRDITSRGLTYMKAGFPV 350 phaeoclathratiforme_HFRGPSGTGKTTVALHLASKIGRPVVIIHGDSEYKTSDLIGSEQGYK gvpNYRRLDDNFIHSVHKYEEDMTKQWVNNRLTIAIKKGFTLVYDEFTRSRPEANNILLPILQEKMMSTSSSNEEEYYMKVHPEFRAIFTSNPEEYAGVNRTQDALRDRMVTMDLDYFDYETELMITHAKSGMSLDDAEKIVKIVRGLRESGKTEFDPTIRGSIMIAKTLNVLNARPDKTNELFKKVCQDILTSETSRVGSKTNQERVRGIVNELIDLHS* Phormidium tenueMNTVLQARPRNFVSTPTLERTSIRALRYLQSGYSIHLKGPAGTGKT 351 NIES-30_gvpNTLALHLADLLARPIMLLFGDDEFKTSDLIGNQSGYTRKKVVDNYIHSVVKVEDELRHNWTDSRLTLACREGFTMVYDEFNRSRPEVNNVLLSALEEKLLVLPPSNNRAEYIRVSPHFRAILTSNPEEYCGVHGTQDALQDRLITINMPEPDELAQQQILVQKVGIDSSAALQIVQLVKAFQSAVAPDMVSSLRPSLMIATICHDHDILPLAENADFRDVCSDILLARSKEPAPDATRHLWNLFNRFVVSQAALVNDLSLKPEAHPTARFHGEEEDDAPLQPLEALVESDIDDVAVEDQPVIGPQDLQGETLPEAVIPEPQGETVVETPAEAEALPEEIARVQVSPDDIETRIFDYLDATGTASLVNIEAALDLNRFQAVNAVKSMLDQGLIEKQETDGQLQGYQLSSN* Planktothrix agardhiiMTTVLQARPKGFVNTPTIEQLTIRALRYLQSGFSLHLRGPAGTGKT 352 str. 7805_gvpNTLAMHLADLLNRPIVLIFGDDELKSSDLIGNQLGYTRKKVVDNFIHSVVKLEDELRQNWIDSRLTLACKEGFTLVYDEFNRSRPEVNNVLLSALEEKLLVLPPNNSRSEYIRVNPHFRAIFTSNPEEYCGVYGTQDALLDRLITIDMPEPDDETQQEILIQKIGISPEDAKNIIEIVKIYLEITTQKKEIKPVQNGKAARPHIDKASGLRPGLIIAKICHEHDISIQENNQDFIKVCADILLSRTNLSLTEAQNKLEKVIKTVLTDGDTSNNSFLPPSETQLTENNSLEIEEQVYQYLQKTTSARVSEIEVALGLNRVQTTNVLRSLLK QGHLKQQDNRFFAVNKQGELIQP*Planktothrix MTTVLQARPKGFVNTPTIEQLTIRALRYLQSGFSLHLRGPAGTGKT 353rubescens_gvpN TLAMHLADLLNRPIVLIFGDDELKSSDLIGNQLGYTRKKVIDNFIHSVVKLEDELRQNWIDSRLTLACKEGFTLVYDEFNRSRPEVNNVLLSALEEKLLVLPPNNSRSEYIRVNPHFRAIFTSNPEEYCGVYGTQDALLDRLITIDMPEPDDETQQEILIQKIGISPEDAKNIIEIVKIYLEITTQKKEIKPVQNGKAARPHIDKASGLRPGLIIAKICHEHDISIQENNQDFIKVCADILLSRTNLSLTEAQNKLEKVIKTVLTDGDTSTNSFLPLSETQLTENNSLEIEEQVYQYLQKTTSARVSEIEVALGLNRVQTTNVLRSLLK QGHLKQQDNRFFAVNKQGELIQP*Psychromonas MSIENLNNVSEIKIEQSDDDHIYPEASEDFVETPYIKEVTERAMLYL 354ingrahamii 37_gvpN1 DAGYPVHFAGPAGTGKTTLAFHIAALRQRPVTLIHGNHEFGTSDLIGKESGYRRHRVVDNYVHSVVKEEEELQSLWSDNRLTTCCRNGDTLVYDEFNRSTPEANNVLLSILEEGILNLPSSRSDGYLEVHPQFRAIFTSNPQEYAGTHATQDALVDRMITIMLHYPDRHTEVRVAIAKSGINSDEAGSIVDIVNEFRELCGSKIVSSGPKTMPTVRASIAIARVLVQKGEHAFRDNTFFHRICRDVLCMYTQQVSFSNRSVLDKQLEDLIMKFCP ATYKSSGSKIRA* PsychromonasMSINNLNISTIKIEQPENDNIYPEASAEFVQTPYIQEVTERALLYLDA 355 ingrahamii 37_gvpN2GYPVHFAGPAGTGKTTLAFHIAALRKRPVTLIHGNHEFGSSDLIGKESGYRRHRLVDNYVHSVMKEEEELKSLWVDNRLTTCCRNGDTLVYDEFNRSTPEANNVLLSILEEGILNLPSLRSMGDGYLEVHPSFRAIFTSNPQEYAGTHATQDALVDRMITIMLNYPDRDTEVRVAVAKSGISNEEAGFIVDIVNEFRELSNHKSLSSGQKSMPTVRASIAISRVLIQKGEHAFRDNVFFHRVCHDVLCMYIQKISPSNRSFLDKQLEVLIGKFCP AAKSALVPKVVK* RhodobacterMTIPRDLPWGDARTPLFEDEELRSLLDRAEIYLREGIAIHFRGPAGV 356 capsulatus SBGKTTLALHLAQRFARPVTFFVGNDWLGRADIFGRDLGETVSTVQD 1003_gvpNHYISSVRRAERKSRIDWQEAPLARAMRDGHVLVYDEFSRSRPEANAALLSVIEEGVLPLSDPAAGRSHIVAHPDFRVILTSNPRDYVGVQAVPDALLDRMITFSLDGMSFETEVGIVATAARTDPADARAICALIHLLRAEKPGTMEISMRSGIMIARLARAAGVAPDPADPVFVQICADVLGTRMRGSDIDDVMALLLRPDPAPAACAGGAR* RhodobacterMTVLSPSLPHAAGIDAALVENPWLGLRRSGRYFQNAETEALFARA 357 sphaeroidesLGYARAGVCVHLAGPAGLGKTTLALRIAQALGRPVAFMTGNEWL 2.4.1_gvpNGSRDFIGGEIGQTVTSVVDRYIQSVRRTEQSARIDWKESILGQAMRCGQTFIYDEFTRASPEANAALLSVLEEGVLVSTDGASRHQYIEAHPDFRVLLTSNPHEYQGVKAAPDALIDRMVTLRLEEPSAPTLAGIVALRSGLDPATARRIVDLILSVQRSGEMQAPPSMRTAILVARLAAPLRLAGRLSDAALAEIAADVLRGRGLEADAAAFEAKLAAPTPGETAR* Serratia sp. ATCCMIKQNTVSQYTVDDDLVVPEASEHFVATSYVNDIIERALVYLRAG 358 39006_gvpNYPVHFAGPSGIGKTTLAFHLAALWGRPVTMLQGNEEFVSSDLTGKDIGYRKSSLVDNYIHSVLKTEEQMNRMWVDNRLTTACRNGDMLIYDEFNRSKAETNNVLLSVLSEGILNLPGLRGVGEGYLDVHPEFRAIFTSNPEEYAGTHKTQDALMDRMITINIGLVDRDTELQILHARSELELKEAAYIVDIIRELRGNEHETKHGLRAGIAIAHILHQQGIKPRYGDKLFHAICYDVLSMDAAKIQHAGRSIYREMVDGVIRKICPPIGSDTVK ASTQKIKAVE* StellaMSTEPAPVMPPSTDIEFGSQRPARPKPAEALAVGYRLSARPAAPST 359 vacuolata_ATCC-LTLRPRADFVETDQVKDLTRRGLGFLRAGYPLHFRGPAGTGKTTL 43931_gvpNALHVAAQLGRPVIVITGDNELGTADLVGSQRGYHYRKVVDQFIHNVTKLEETANQRWTDHRLTTACREGYTLVYDEFTRSRPETHNVLLGVFEEKILFLPAEAREECYIRVHPDFRAIFTSNPQEYAGVHASQDALADRLATIDVDYPNRAMELAVASARTGLAEAEAARIIDLVRAFRASGDYQQTPTMRASLMIARVAAQEGLRISVDDPGFVQLCMDALESRIFSGARQEADARARHRVALLGLLATHCPSEAPVARVATVARAKRKS AS* Thiocapsa rosea strainMSAKPLQDASEVSALNNDNVQPEASDTFVCTPSVEALAERASAYL 360 DSM 235QAGYPVHLAGPAGTGKTTLAFHAAAKRGRPVKLIHGNDELGLAD Ga0242571_11_gvpNMVGQDNGYRRNTLVDNYIHSVVKTQEEVRTFWIDNRVTTACLNGETLIYDEFNRSRPEVNNIFLSILGEGILNLPNRRHQGAGYLEVHPEFRVIFTSNPEEYAGTHKTQDALMDRMITMKIGHYDRETEIRVTRAKSGLPPSEVAIVVDIVRELRGQSVNHHRPTLRACIAIARIMADRRISARSNNSFFRDICRDILDMDSAKVRRDGNALGESPVDDVVASISARAR RPKIVEPKGLHKEI*Tolypothrix sp. PCC MTNTENHKKRAVLRVRPGQFVVTPAIEKVAIRALRYLTSGFAIHLR 3617601_gvpN GPAGTGKTTLAMHLANCLDRPIMLIFGDDEFKSSDLIGSESGYTHKKLLDNYIHNVLKVEDELKQNWVDSRLTLACREGLTLVYDEFNRSRPEVNNVLLSALEEKILTLPPSSNQPEYLHVHPKFRAIFTSNPEEYCGVHSTQDALMDRLVTINMPEPDEQTQIEILTHKTGIHHEYAQLIARLVKAFRSATGAEKTSGLRSCLMVAKVCAEHDILVTPENTDFREICADVLFNRTNLSASDATTLFLELLNHVQVKPVEPVDDSDPYDVAEAEIVGAAEPQTDAIAEPVTLDESLLSDQPN* TrichodesmiumMTTVLNVSPDRFVSTPGVERVTQRASRYLESGYSVHLRGPAGVGK 362 erythraeumTTLALHLAHLRQQPIFLMIGDDEFKTSDLIGNKSGYTRKKLVDNYI IMS101_gvpN1HTVLKVEDELRDNWIDSRLTLACKEGFTLIYDEFNRSRPEVNNVLLSVLEEKMLVLPPSQNQSEYIQVHPQFRVILTSNSEEWTGVHATQDALLDRVVTIGMEQPDISTEQNIVIQKTGINPLKAEVIIKLVRSVRQRVDKEDLGSLRSALMISKVCHDHDIPLDGKDSSFSDLCADILISRPNLPRQEALQQLDEVLEEFFPADQPSSSDVGLEKEGSL* TrichodesmiumMTTVLNVSPDRFVSTPSVERVTQRASRYLESGYSVHLRGPAGVGK 363 erythraeumTTLALHLAHLRQQPIFLMIGDDEFKTSDLIGNKSGYTRKKLVDNYI IMS101_gvpN2HTVLKVEDELKHNWIDSRLTLACKEGFTLIYDEFNRSRPEVNNVLLSVLEEKMLVLPPSQNQSEYIQVHPQFRVILTSNSEEWTGVHATQDALLDRVVTIGMGQPDISTEQNIIIQKTGINPLKAEVIIKLVRSVRERLETEDLGSLRSALMISKVCHDHDIPLGGKDSNFSDLCADILISRANLPRQEALKQLDEVLEELFPADQLSISDIGLKKEGSL* gvpV Anabaena-flos-MIKNIQVFFMKTISNRSISRAKISTMPRPKSDASSQLDLYKMVTEK 364 aquae_gvpVQRIQRDMYSIKERMGLLQQRLDILNQQIEATEKTIHKLRQPHSNTA QNIVRSNIFVESNNYQTFEVEY*Aphanizomenon flos- MKSFRHRSIIRAKISTMPRHISEASSQLELYKMVAEKQRISRELSSIK 365aquae NIES-81_gvpV ERMATLQKRLDSLNNEIDNTEKTIHKLRQPHSSTAQNIVRSKNVVESNNYQTFEIEY* Arthrospira platensisMRYKYHRQIQPKLSAIPRQKSQANLYRNSYLLAVEKKRLTEELEV 473 NIES-39_gvpVLQSRSHIIEQRLALIEDQLGELEKDVTQLSVPPSPKPQNNLPVNNPE PPPQSNPTNSSHINTFMVDY*Burkholderia MPIPKKGLHDIRFRHAPGATPLPVHSMYMRISCIEMEKSRRTIERRA 366thailandensis sp. AQRRIAAVDSRVADLEREKARLYAAIDNEAPQAGDIRGSFRIRY*Bp5365 strain MSMB43_gvpV DesulfobacteriumMLKNRNRSIKGVQNIKTHAGKVDHVSHPHMAYMRISCLEMEKAR 367 vacuolatum_DSMKNKEKSGAQKRIDMINQRLMEIEKEKAHIQRILGDTSIALESSNVD 3385_gvpV HDSEIKGGFKIKY*Desulfomonile tiedjei MNIRMKGNSRGLRDIRTHSGKVDRVGLPYMAYMSISCLEMEKAR 368DSM 6799_gvpV REKERLSALTRIKNIEQRIREIEAEKDLLLKGVGERTRTDLQKASTPRDQSAQCKGGFKIRY* Legionella drancourtiiMMPALVKGLRNIKTMSNRLDKVQSPHEAFISAAALHREKQRHLQ 369 LLAP12_gvpVELAILRNRLDEINLRLEQINEQQNQVAEAFDISPPRAVKSALRTGIQ SKTGSTSHGFKIKY*Microcystis aeruginosa MTTTRPPRPIRSKISTMPRKQSEADHQLELYKLITEKQRIQEKLEM370 NIES-843_gvpV MERQIQQLKNRLTFVTEQIETTEQSIQNLRTANPPSVAKKPDSPKTVAHSSNNSSNFQTFYLEY* Nostoc punctiformeMHRTPNRRQIQAKLSTMPPQRSQATVYLNAYKMMLEKERLEEEL 371 ATCC 29133_gvpVEKLEARRHQIQQRLAILNSQTIPEENMTHQQANTDLENNTPKFNTL TLEY* Nostoc sp. PCCMLSIIQVFPMTKVRNRGIIRPKITTMPRNKSEASSQLELYKLVTEQQ 372 7120_gvpVRIKQELAFIEQRTVLLKQRLSTLKTQIEGTERSINHLRHSELKYSRIA LPKIFSETNNYQAFDIEY*Planktothrix agardhii MRPFRSQPPILPKISTMPRQKTEATLYRSLYQLAVEKKRLQEELESL373 str. 7805_gvpV GQRFETVTQRLQQIETQIQGLETDVKQIAPPKPPETKPNQPSTPTPTKAEPGSVSTFTLDY* PsychromonasMTAAKRKTLRGLADIRTISSCGTSGQEAYQMYLKRGVLEMEKLR 374 ingrahamii 37_gvpV1RQKEKNSALERVTNINRRLMAIDTDIDFLCQSLKVIEKRTNQENSIV EKSVSRGFKLRY*Psychromonas MIFSKKKNALRGLADIRTLSGCGTSGQEAYQMYLKRGVLEMEKL 375ingrahamii 37_gvpV2 RRQKEKNSALERVRNINYRLMAIDADIDFLCQSLKVIEERTNKENSISNESVTYKKGFKLRY* Serratia sp. ATCCMAISTRPLRTLSDIKTHSGRVSGEHQTYRDYFQIGALELERWRRTR 376 39006_gvpVEREAASSRIASIDERIADIDKEKAALLADATAASAVAENNDKSEAA EKKKKSSGLRIKY*Thiocapsa rosea strain MSKFTQPSRSVRDIKTLAGMADDVRAPHKMYMRLFALETERHRR 377DSM 235 LQERASAMLRVDNIDARCAEIAEEMEQLLQILGVEAVAPGGPPAN Ga0242571_11_gvpVARPGSGRVPTQPHRGRGKGTGAGRQTTSGETSVGEAVKIRY* gvpW Anabaena-flos-MELENLYTYAFLEIPSSPLILPQGAANQVVLINGTELAAIVEPGIFLE 378 aquae_gvpWSFQNNDEKIIQMALSHDRVICELFQQITVLPLRFGTYFTSTNNLLNHLKSHEKEYQNKLEKINGKNEFTLKLIPRMIEEIVPSEGGGKDYFLAKKQRYQNQNNFSIAQAAEKQNLIDLITKVNQLPVVVQEQEEQIQIYLLVSCQDKTLLLEQFLTWQKACPRWDLLLGDCLPPYHFI* Aphanizomenon flos-MELENLYTYAFLKTPSFSLHLPQGSTTSVIQIDGNGLSAIVEPGISLD 379 aquae NIES-81_gvpWSFQDDDEKIVQMAIEHDRVICDIFRQITVLPLRFGTYFANTDNLLTHLESYGQEYLDKLEKINCKTEFILKLIPRMITEESPVLESGRHYFLAKKQHYQRQKNFILAQASEKEILINFISKINQIPVIIQEQEEEVRIYLLVNYQDKTLLLEQFLTWQQTCPRWDLFLGEGIPPYHFI* Arthrospira platensisMYVYAFIKSQSISWKSVQGIYEPVVLLEAGALAAVVEPNLQAENL 380 NIES-39_gvpWSADNEEELMRAVLTHDRIVCQIFEETTVLPVRFGTCFDSEARLCEHLTTEGDRYFRQLEKLTGRAEYLLEAIPQPFNQEKPSSDTTAPPTKGRDYFLQKKRLHQQRLNFEQQQEQQWQDFINAIASKYPIVQGKATEDAERIYLLIPRSQEVALVEWVAQQQQNIDLWEFSLGNAVPAYHFL * DolichospermumMKLENFYTYAFLEIPRFPLVLPQGAASQVILINGSGMSAIVEPGISLE 381 circinale_gvpWSFQNNDEKIIQMALSHDRVICELFQQVTVLPLRFGTCFTSTNNLLNYLELHRQEYQEKLEKINGKIEFTLKLIPQTMEEPAPLERGGRDYFLAKKQRYQDQNNFRIAQAAEKQNLIDSISKVNQLPFVIQEKEEEVNIYLLVKSEDKTLLLEQFLNWQKACPRWDLLLGEPLPPYHFI Microcystis aeruginosaMKLYNLYTYAFLKTPIESLKLPVGMANPLLLITGGELSAVVEPEVG 382 NIES-843_gvpWLDTLQNDDERLIQSVLCHDRVICQLFQQTTILPLRFGTSFLEAENLLTHLCSHGQEYQEKIEELEGKGEYLLKCIPRKPEEPVLFSESKGRQYFLAKKQLYEAQQDFYTLQGSEWQNLVNLITQSYPSTRIITAPGTESRIYLLVNLQEEPLLIEQVLHWQKACPRWELQLGQVSPPYHFT* Nostoc punctiformeMSIYAYALLVPTASPLVLPLGMERNTELVYSSGLAALVEPEISLEAI 383 ATCC 29133_gvpWQATDERLLQAVLNHDHVIRELFQQTPLLPLRFGRGFTSVEKLLNHLENHQEQYLETLTQLADKVEYSVKVTACSLLDDSDTIDARGKAYLLAKKQRYQTQQAFQAQQCEQWELLNELILKTYTNVICETRQSDVRQIHFLAQRNDSTLSTQLFSLWQVQCSHWQLALSEPLPPYHFLKNTL I* Nostoc sp. PCCMRSPNFYTYAFLNTPDIPLRLPSGNLGQLLLIHGHKLSAVVEPGISL 384 7120_gvpWESSQNNDEEVIKMVLAHDRVICELSQQTTVLPLRFGTYFNSEETLLNHIESHAQEYQKKLDHIQGKTEYTLKLIPRKFEELAKVSGGNGRDYFLAKKLHYEHQKNFIGDQNREKNHLINLIMDVYRSSAIIQDYVEEVRLHLLVDRHDKTLLFKQVLTLQEKCPHWNLILGEPLPPYHFV* gvpR Bacillus-MEIKKIMQAVNDFFGEHVAPPHKITSVEATEDEGWRVIVEVIEERE 385 megaterium_gvpRYMKKYAKDEMLGTYECFVNKEKEVISFKRLDVRYRSAIGIEA* gvpS Bacillus-MSLKQSMENKDIALIDILDVILDKGVAIKGDLIISIAGVDLVYLDLR 386 megaterium_gvpSVLISSVETLVQAKEGNHKPITSEQFDKQKEELMDATGQPSKWTNP LGS* Rhodococcus hoagiiMSATPDRRIALVDLLDRVLGGGVVVAGEITLSIADVDMVHISLRTL 387 103S_gvpSVSSVSALTRPPDEKPENDG* gvpT Bacillus-MATETKLDNTQAENKENKNAENGSKEKNGSKASKTTSSGPIKRA 388 megaterium_gvpTVAGGIIGATIGYVSTPENRKSLLDRIDTDELKSKASDLGTKVKEKSKSSVASLKTSAGSLFKKDKDKSKDDEENVNSSSSETEDDNVQEYDELKEENQTLQDRLSQLEEKMNMLVELSLNKNQDEEAEDTDSDEEENDENDENDENEQDDENEEETSKPRKKDKKEAEEEESESDEDSEEEEEDSRSNKKNKKVKTEEEDEDESEEEKKEAKPKKSTAKKSKNTKA KKNTDEEDDEATSLSSEDDTTA*gvpU Bacillus- MSTGPSFSTKDNTLEYFVKASNKHGFSLDISLNVNGAVISGTMISA 389megaterium_gvpU KEYFDYLSETFEEGSEVAQALSEQFSLASEASESNGEAEAHFIHLKNTKIYCGDSKSTPSKGKIFWRGKIAEVDGFFLGKISDAKSTSKKSS*

TABLE 7 Protein sequences of gvpC from exemplary species: UniProt SEQ IDSpecies ID No. Amino acid Sequence NO: Anabaena flos- P09413MISLMAKIRQEHQSIAEKVAELSLETREFLSVTTAKRQEQAEK 390 aquaeQAQELQAFYKDLQETSQQFLSETAQARIAQAEKQAQELLAFHKELQETSQQFLSATAQARIAQAEKQAQELLAFYQEVRETSQQFLSATAQARIAQAEKQAQELLAFHKELQETSQQFLSATADAR TAQAKEQKESLLKFRQDLFVSIFGHalobacterium P24574 MSVTDKRDEMSTARDKFAESQQEFESYADEFAADITAKQDD 391salinarum VSDLVDAITDFQAEMTNTTDAFHTYGDEFAAEVDHLRADIDAQRDVIREMQDAFEAYADIFATDIADKQDIGNLLAAIEALRTEMNSTHGAFEAYADDFAADVAALRDISDLVAAIDDFQEEFIAVQDAFDNYAGDFDAEIDQLHAAIADQHDSFDATADAFAEYRDEFYRIEVEALLEAINDFQQDIGDFRAEFETTEDAFVAFARDFYGHEITAEEGAAEAEAEPVEADADVEAEAEVSPDEAGGESAGTEEEETEPAEVETAAPEVEGSPADTADEAEDTEAEEETEEEAPEDMVQCRVCGEYYQAITEPHLQTHDMTIQEYRDEYGEDVPLR PDDKT Halobacterium Q02228MSVKDKREKMTATREEFAEVQQAFAAYADEFAADVDDKRD 392 mediterraneiVSELVDGIDTLRTEMNSTNDAFRAYSEEFAADVEHFHTSVADRRDAFDAYADIFATDVAEMQDVSDLLAAIDDLRAEMDETHEAFDAYADAFVTDVATLRDVSDLLTAISELQSEFVSVQGEFNGYASEFGADIDQFHAVVAEKRDGHKDVADAFLQYREEFHGVEVQSLLDNIAAFQREMGDYRKAFETTEEAFASFARDFYGQGAAPMATPLNNAAETAVTGTETEVDIPPIEDSVEPDGEDEDSKADDVEAEAEVETVEMEFGAEMDTEADEDVQSESVREDDQFLDDETPEDMVQCLVCGEYYQAITEPHLQTHDMTIKKYREEYGED VPLRPDDKA Microchaete P08041MTPLMIRIRQEHRGIAEEVTQLFKDTQEFLSVTTAQRQAQAK 393 diplosiphonEQAENLHQFHKDLEKDTEEFLTDTAKERMAKAKQQAEDLFQFHKEMAENTQEFLSETAKERMAQAQEQARQLREFHQNLEQTTNEFLADTAKERMAQAQEQKQQLHQFRQDLFASIFGTF Nostoc sp. Q8YUS9MTALMVRIRQEHRSIAEEVTQLFRETHEFLSATTAHRQEQAK 394QQAQQLHQFHQNLEQTTHEFLTETTTQRVAQAEAQANFLHKFHQNLEQTTQEFLAETAKNRTEQAKAQSQYLQQFRKDLFASI FGTF

TABLE 8Amino acid sequences of exemplary GVS and GVA proteins from B. megaterium.GVA SEQ ID Protein Amino acid sequence NO.: gvpBMSIQKSTNSSSLAEVIDRILDKGIVIDAFARVSVVGIEILTIEARVVIASVDTW 395LRYAEAVGLLRDDVEENGLPERSNSSEGQPRFSI gvpRMEIKKIMQAVNDFFGEHVAPPHKITSVEATEDEGWRVIVEVIEEREYMKKYAKD 396EMLGTYECFVNKEKEVISFKRLDVRYRSAIGIEA gvpNMTVLTDKRKKGSGAFIQDDETKEVLSRALSYLKSGYSIHFTGPAGGGKTSLARA 397LAKKRKRPVMLMHGNHELNNKDLIGDFTGYTSKKVIDQYVRSVYKKDEQVSENWQDGRLLEAVKNGYTLIYDEFTRSKPATNNIFLSILEEGVLPLYGVKMTDPFVRVHPDFRVIFTSNPAEYAGVYDTQDALLDRLITMFIDYKDIDRETAILTEKTDVEEDEARTIVTLVANVRNRSGDENSSGLSLRASLMIATLATQQDIPIDGSDEDFQTLCIDILHHPLTKCLDEENAKSKAEKIILEECKNIDTEEK gvpFMSETNETGIYIFSAIQTDKDEEFGAVEVEGTKAETFLIRYKDAAMVAAEVPMKIY 398HPNRQNLLMHQNAVAAIMDKNDTVIPISFGNVFKSKEDVKVLLENLYPQFEKLFPAIKGKIEVGLKVIGKKEWLEKKVNENPELEKVSASVKGKSEAAGYYERIQLGGMAQKMFTSLQKEVKTDVFSPLEEAAEAAKANEPTGETMLLNASFLINREDEAKFDEKVNEAHENWKDKADFHYSGPWPAYNFVNIRLKVEEK gvpGMLHKLVTAPINLVVKIGEKVQEEADKQLYDLPTIQQKLIQLQMMFELGEIPEEAF 399QEKEDELLMRYEIAKRREIEQWEELTQKRNEES gvpLMGELLYLYGLIPTKEAAAIEPFPSYKGFDGEHSLYPIAFDQVTAVVSKLDADTYS 400EKVIQEKMEQDMSWLQEKAFHHHETVAALYEEFTIIPLKFCTIYKGEESLQAAIEINKEKIENSLTLLQGNEEWNVKIYCDDTELKKGISETNESVKAKKQEISHLSPGRQFFEKKKIDQLIEKELELHKNKVCEEIHDKLKELSLYDSVKKNWSKDVTGAAEQMAWNSVFLLPSLQITKFVNEIEELQQRLENKGWKFEVTGPWPPYHFSSFA gvpSMSLKQSMENKDIALIDILDVILDKGVAIKGDLIISIAGVDLVYLDLRVLISSVETLV 401QAKEGNHKPITSEQFDKQKEELMDATGQPSKWTNPLGS gvpKMQPVSQANGRIHLDPDQAEQGLAQLVMTVIELLRQIVERHAMRRVEGGTLTDE 402QIENLGIALMNLEEKMDELKEVFGLDAEDLNIDLGPLGSLL gvpJMAVEHNMQSSTIVDVLEKILDKGVVIAGDITVGIADVELLTIKIRLIVASVDKAKE 403IGMDWWENDPYLSSKGANNKALEEENKMLHERLKTLEEKIETKR gvpTMATETKLDNTQAENKENKNAENGSKEKNGSKASKTTSSGPIKRAVAGGIIGATI 404GYVSTPENRKSLLDRIDTDELKSKASDLGTKVKEKSKSSVASLKTSAGSLFKKDKDKSKDDEENVNSSSSETEDDNVQEYDELKEENQTLQDRLSQLEEKMNMLVELSLNKNQDEEAEDTDSDEEENDENDENDENEQDDENEEETSKPRKKDKKEAEEEESESDEDSEEEEEDSRSNKKNKKVKTEEEDEDESEEEKKEAKPKKSTAKKSKNTKAKKNTDEEDDEATSLSSEDDTTA gvpUMSTGPSFSTKDNTLEYFVKASNKHGFSLDISLNVNGAVISGTMISAKEYFDYLSE 405TFEEGSEVAQALSEQFSLASEASESNGEAEAHFIHLKNTKIYCGDSKSTPSKGKIFWRGKIAEVDGFFLGKISDAKSTSKKSS

TABLE 9 Amino acid sequences of exemplary GVS and GVA proteins fromSerratia sp.. GVA SEQ ID Protein Amino acid sequence NO.: gvpA1MAKVQKSTDSSSLAEVVDRILDKGIVIDAWVKVSLVGIELLSIEARVVIASVETY 406LKYAEAIGLTASAATPA gvpA2MPVNKQYQDEQQQVSLCEALDRVLNKGVVIVADITISVANIDLIYLSLQALVSSV 407 EAKNRLPGREgvpA3 MPVNKQYQDEQQQVSLCEALDRVLNKGVVIVADITISVANIDLIYLSLQALVSSV 408EAKNRLPGRE gvpC MGCLTDGMAQLRKNIDDSHESRIAQQNARVSSVSAQIAGFSTTRARNAAQDAR409 ARATFVADNVRGVNRMLSDFCHTREVMSRQQSEERATFVTDMSKKTLALLDGFNAERKSMAERCAKERADFIANVANDVAAFLSASEKDRMAAHAVFFGMTLAKK KTSLAV gvpNMIKQNTVSQYTVDDDLVVPEASEHFVATSYVNDIIERALVYLRAGYPVHFAGPS 410GIGKTTLAFHLAALWGRPVTMLQGNEEFVSSDLTGKDIGYRKSSLVDNYIHSVLKTEEQMNRMWVDNRLTTACRNGDMLIYDEFNRSKAETNNVLLSVLSEGILNLPGLRGVGEGYLDVHPEFRAIFTSNPEEYAGTHKTQDALMDRMITINIGLVDRDTELQILHARSELELKEAAYIVDIIRELRGNEHETKHGLRAGIAIAHILHQQGIKPRYGDKLFHAICYDVLSMDAAKIQHAGRSIYREMVDGVIRKICPPIGSDTVKASTQKIKA VE gvpVMAISTRPLRTLSDIKTHSGRVSGEHQTYRDYFQIGALELERWRRTREREAASSRI 411ASIDERIADIDKEKAALLADATAASAVAENNDKSEAAEKKKKSSGLRIKY gvpF1MMSIDKSRNHRAKVLYALCVSDDSTPNYKIRGLEAAPVYSIDQDGLRAVVSDTL 412STRLRPERRNITAHQAVLHKLTEEGTVLPMRFGVIARNAEAVKNLLVANQDTIREHFERLDGCVEMGLRVSWDVTNIYEYFVATYPVLSETRDEIWNGNSNANNHREEKIRLGNLYESLRSGDRKESTEKVKEVLLDYCEEIIENPVKKEKDVMNLACLVARERMDEFAKGVFEASKLFDNVYLFDYTGPWAPHNFVTLDLHAPTAKKKTLTRA GTLSD gvpF2MTMNTEAQTEQAIYLYGLTLPDLAAPPILGVDNQHPINTHQCAGLNAVISPVALS 413DFTGEKGEDNVQNVTWLTPRICRHAQIIDSLMAQGPVYPLPFGTLFSSQNALEQEMKSRATDVFVSLRRITGCQEWALEATLDRKQAVDVLFTEGLDSGRFCLPEAIGRRHLEEQKLRRRLTTELSDWLAHALTAMQNELHPLVRDFRSRRLLDDKILHWAYLLPVEDVAAFQQQVADIVERYEAYGFSFRVTGPWAAYSFCQPDES gvpF3MSLLLYGIVAEDTQLALEPDGSPHAGEEPMQLVKAATLAALVKPCEADVSREPA 414AALAFGQQIMHVHQQTTIIPIRYGCVLADEDAVTQHLLNHEAHYQTQLVELENCDEMGIRLSLASAEDNAVTTPQASGLDYLRSRKLAYAVPEHAERQAALLNNAFTGLYRRHCAEISMFNGQRTYLLSYLVPRTGLQAFRDQFNTLANNMTDIGVISGPW PPYNFAS gvpGMLLIDDILFSPVKGVMWIFRQIHELAEDELAGEADRIRESLTDLYMLLETGQITED 415EFEQQEAVLLDRLDALDEEDDMLGDEPGDDEDDEYEEDDDEEDDDEEDDDDEDDDDEDDDDEEDDDDDEDDDDEDEPEGTTK gvpWMKPAIYPKFLLESPLKLVFFGGKGGVGKSTCATSTALRLAQEQPQHHFLLVSTDP 416AHSLQNILSDLVLPKNLDVRELNAAASLHEFKSQHEGVLKEIAYRGTVLDQNDVQGLMDTALPGMDELAAYLEIAEWIQKDTYYRIIIDTAPTGHTLRLLEMPDLIYRWLTALDTLLAKQRYIRKRFAGDNRLDHLDHFLLDMNDSLKAMHELVTDSTRCCFVLVMLAEAMSVEESIDLAGALNQQRVFLSDLVVNRLFPENDCPTCCVERNRQMLALQNGYQRLPGHVFWTLPLLAIEPRGALLHEFWSGVRLLDENEVMATTCHHQLPLRVESSISLPASTFRLLIFAGKGGVGKTTLACATALRLNSEYPELRILLFSADPAHSLSDCLGVTLQQQPISVLVNIDAQEINAQADFDKIRQGYRAELEAFLLDTLPNLDITFDREVLEHLLDLAPPGLDEIMALTAIMDHLDSGRYDMVIVDGAPSGHLLRLLELPELIRDWLKQFFSLLLKYRKVMRFPHLSERLVQLSRELKNLRALLQDTKQTGLYAVTVPTHLALEKTYEMTCALQRLGLTANALFINQITPPSDCTLCQAITSRESLELKCADEMFPSQPHAQIFRQTEPTGLSKLKTLGSALFL gvpKMTTNQLSHHSPVFGPTSPAIQRPITEANRHKIDIDGERVRDGLAQLVLTLVKLLH 417ELLERQAIRRMDSGSLSDEEVERLGLALMRQAEELTHLCDVFGFKDDDLNLDLG PLGRLL gvpXMVNTTNDINAATRGLLLRMGNAWFEQDELRQAVDIYLKIIEQYPDSKESKTAQT 418ALLTISQRYERDGLFRLSLDILERVGEITPTSI gvpYMRALIHFPIIHSPKDLGTLSEAASHLRTETQTRAYLAAVEGFWTMITTTIEGLDLD 419YTHLKLYQDGLPVCGKENEIVTDVANAGSQNYKLLLTLQHKGAILMGTESPELLLQERDLMTQLLQSTEQTEASLETAKTLLNRRDDYIAQRIDETLQDGEMAILFLGLMHNIEAKLPADIVFIQPLGKPPGGESI gvpHMTGNVEGILRGLGDLVEKLVETGEQIKRSGAFDIDTNDGKNAKAVYGFSIKMGL 420DGNQENRVEPFGNIRRDEQTGEATVQEVSEPLVDVIEESDHVLVLAEMPGVADEDVQVELNGDILTLHSERGSKKYHKEIVLPCSFDDKAMERSCRNGILEVKLGK gvpZMSEELKLKVAEALPKDAGRGYARLDPADMARLNLAVGDIVQLTSKKGTGIAKL 421MPTYPDMRNKGIVQLDGLTRRNTSLSLDEKVQIEPASCKHATQIVLIPTTITPNQRDLDYIGSLLDGLPVQKGDLLRAHLFGSRSADFKVESTIPDGAVLIDPTTTLVIGKSNAVGNSSHSTQRLSYEDVGGLKNQVRRIREMIELPLRYPEVFERLGIDAPKGVLLSGPPGCGKTLIARIIAQETDAQFFTISGPEIVHKFYGESEAHLRKIFEEAGRKGPSIIFLDEIDSIAPHRDKVVGDVEKRIVAQLLALMDGLKNRGKVIVIAATNLPNAIDPALRRPGRFDREISIPIPDREGRREIIEIHSTGMPLNADVDLNVLADITHGFVGADLEALCREAAMSALRRLLPEIDFSSAELPYDRLAELTVMMDDFRAALCEVSPSAIRELFVDIPDVRWEDVGGLDDVRRRLIESVEWPIKYPELYEQAGVKPPKGLLLAGPPGVGKTLIAKAVANESGVNVISVKGPALMSRYVGDSEKGVRELFLKARQAAPCIIFLDEVDSVIPARNEGAIDSHVAERVLSQFLSEMDGLEELKGVFVMGATNRADLIDPAMLRPGRFDEIIELGLPDEDARRQILAVHLRNKPLGDNIHADDLAERCDGASGAELAAVCNRAALAALRRAIQQSEEAVLSPSTVGETPVALTVRIEQHDFAEVIAEMFG DDA

TABLE 10 Amino Acid Sequences of GV proteins from Anabaena flos-aquaegvp SEQ ID gene Sequence NO: gvpAMAVEKTNSSSSLAEVIDRILDKGIVIDAWVRVSLVGIELLAIEARIVIASVETYLK 422YAEAVGLTQSAAVPA gvpCMISLMAKIRQEHQSIAEKVAELSLETREFLSVTTAKRQEQAEKQAQELQAFYKD 423LQETSQQFLSETAQARIAQAEKQAQELLAFHKELQETSQQFLSATAQARIAQAEKQAQELLAFYQEVRETSQQFLSATAQARIAQAEKQAQELLAFHKELQETSQQFLSATADARTAQAKEQKESLLKFRQDLFVSIFG gvpNMTTTKVNHKRAVLRLRPGQFVVTPAIERVAIRALRYLKSGFPVHLRGPAGTGKT 424TLAMHLANCLDRPVMLLFGDDQFKSSDLIGSESGYTHKKVLDNYIHSVVKLEDEFKQNWVDSRLTLACREGFTLVYDEFNRSRPEVNNVLLSALEEKILSLPPSSNQPEYLSVNPQFRVIFTSNPEEYAGVHSTQDALMDRLVTISMPEPDEITQTEILIQKTNIDRESANFIVRLVKSFRLATGAEKTSGLRSCLMIAKVCADNNIPVTTESLDFPDIAIDILFNRSHLSMSESTNIFLELLDKFSAEELEILNNRVTGDNDFLIDNSQFVSQQLAGQ PN gvpJMLPTRPQTNSSRTINTSTQGSTLADILERVLDKGIVIAGDISISIASTELVHIRIRLLI 425SSVDKAKEMGINWWESDPYLSTKAQRLVEENQQLQHRLESLEAKLNSLTSSSVKEEIPLAADVKDDLYQTSAKIPSPVDTPIEVLDFQAQSSGGTPPYVNTSMEILDFQAQTSAESSSPVGSTVEILDFQAQTSEESSSPVVSTVEILDFQAQTSEESSSPVGSTVEILDFQAQTSEEIPSSVDPAIDV gvpKMVCTPAENFNNSLTIASKPKNEAGLAPLLLTVLELVRQLMEAQVIRRMEEDLLS 426EPDLERAADSLQKLEEQILHLCEMFEVDPADLNINLGEIGTLLPSSGSYYPGQPSSRPSVLELLDRLLNTGIVVDGEIDLGIAQIDLIHAKLRLVLTSKPI gvpFMSIPLYLYGIFPNTIPETLELEGLDKQPVHSQVVDEFCFLYSEARQEKYLASRRNL 427LTHEKVLEQTMHAGFRVLLPLRFGLVVKDWETIMSQLINPHKDQLNQLFQKLAGKREVSIKIFWDAKAELQTMMESHQDLKQQRDNMEGKKLSMEEVIQIGQLIEINLLARKQAVIEVFSQELNPFAQEIVVSDPMTEEMIYNAAFLIPWESESEFSERVEVIDQKFGDRLRIRYNNFTAPYTFAQLDS gvpGMLTKLLLLPIMGPLNGVVWIAEQIQERTNTEFDAQENLHKQLLSLQLSFDIGEIGE 428EEFEIQEEEILLKIQALEEEARLELEAEQEEARLELEAEQEDFEYPPQFTAEVNKD QHLVLLP gvpVMIKNIQVFFMKTISNRSISRAKISTMPRPKSDASSQLDLYKMVTEKQRIQRDMYSI 429KERMGLLQQRLDILNQQIEATEKTIHKLRQPHSNTAQNIVRSNIFVESNNYQTFE VEY gvpWMELENLYTYAFLEIPSSPLILPQGAANQVVLINGTELAAIVEPGIFLESFQNNDEKII 430QMALSHDRVICELFQQITVLPLRFGTYFTSTNNLLNHLKSHEKEYQNKLEKINGKNEFTLKLIPRMIEEIVPSEGGGKDYFLAKKQRYQNQNNFSIAQAAEKQNLIDLITKVNQLPVVVQEQEEQIQIYLLVSCQDKTLLLEQFLTWQKACPKWDLLLGDCLPPY HFI

Example 5: Identification of Alternative B. megaterium Gene ClusterDetectable by TEM in E. coli

The Gas Vesicle gene cluster of Table 8 above was tested to identifypossible alternative clusters detectable by TEM.

In particular, the B. megaterium gene cluster can be expressed in E.coli Rosetta 2(DE3)pLysS cells using the two construct schematicallyillustrated in FIG. 7 top panel.

The formation of gas vesicles was detected through TransmissionElectronic Microscopy (TEM) after expression of gas vesicles genes for22 hours.

The results shown in FIG. 7 bottom panel indicate that gvpR and gvpTgenes in the B. megaterium gene cluster are not necessary for gasvesicle formation.

Therefore, the following alternative GV cluster including 9 gyp genesequences of B. megaterium genes shown in the following Table 11 andFIG. 12B was identified as detectable by TEM and ultrasound in mammaliancells (HEK293 and CHO-K1).

TABLE 11 Gvp genes of exemplary GV gene cluster from B. megateriumSeq ID Gene Sequence NO: gvpB ATGAGCATCCAGAAGTCCACCAACAGCAGC 431AGCCTGGCCGAAGTGATCGACCGGATCCTG GACAAGGGCATCGTGATCGACGCCTTCGCCAGAGTGTCCGTCGTGGGCATCGAGATCCTG ACCATCGAGGCCAGAGTCGTGATCGCCAGCGTGGACACCTGGCTGAGATATGCCGAAGCC GTGGGCCTGCTGCGGGACGACGTGGAAGAAAATGGCCTGCCCGAGCGGAGCAACAGCTCT GAGGGACAGCCCCGGTTCAGCATCTGA gvpNATGACCGTGCTGACCGACAAGCGGAAGAAG 432 GGCAGCGGCGCCTTCATCCAGGACGACGAGACAAAAGAGGTGCTGAGCAGAGCCCTGAGC TACCTGAAGTCCGGCTACAGCATCCACTTCACCGGACCTGCCGGCGGAGGCAAGACATCT CTGGCTAGAGCCCTGGCCAAGAAACGGAAGCGGCCCGTGATGCTGATGCACGGCAACCAC GAGCTGAACAACAAGGACCTGATCGGCGATTTCACCGGCTACACCAGCAAGAAAGTGATC GACCAGTACGTGCGGAGCGTGTACAAGAAAGACGAACAGGTGTCCGAGAACTGGCAGGAC GGCAGACTGCTGGAAGCCGTGAAGAATGGCTACACCCTGATCTACGACGAGTTCACCAGA AGCAAGCCCGCTACCAACAACATCTTCCTGAGCATCCTGGAAGAGGGCGTGCTGCCCCTG TACGGCGTGAAGATGACCGACCCTTTCGTGCGCGTGCACCCCGACTTCAGAGTGATCTTC ACCAGCAACCCCGCCGAGTATGCCGGCGTGTACGATACCCAGGACGCCCTGCTGGACCGG CTGATCACCATGTTCATCGACTACAAGGACATCGACCGGGAAACCGCCATCCTGACCGAG AAAACCGACGTGGAAGAGGACGAGGCCCGGACCATCGTGACCCTGGTGGCCAACGTGCGG AACAGAAGCGGCGACGAGAATAGCAGCGGCCTGAGCCTGAGAGCCAGCCTGATGATTGCC ACCCTGGCCACCCAGCAGGACATCCCTATCGATGGCAGCGACGAGGACTTCCAGACCCTG TGCATCGACATCCTGCACCACCCCCTGACCAAGTGCCTGGACGAGGAAAACGCCAAGAGC AAGGCCGAGAAGATCATTCTGGAAGAGTGCAAGAACATCGACACCGAGGAAAAGTGA gvpF ATGAGCGAGACAAACGAGACAGGCATCTAC 433ATCTTCAGCGCCATCCAGACCGACAAGGAC GAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCCGAGACATTCCTGATCCGGTAC AAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAG AACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATC CCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTGCTGGAAAAC CTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTG AAAGTGATCGGCAAGAAAGAGTGGCTGGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAA AAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAG CTGGGCGGCATGGCCCAGAAGATGTTCACCAGCCTGCAGAAAGAAGTGAAAACCGACGTG TTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAGACA ATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGACGAGGCCAAGTTCGACGAAAAA GTGAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGG CCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGTGA gvpG ATGCTGCACAAGCTCGTGACCGCCCCCATC 434AACCTGGTCGTGAAGATCGGCGAGAAGGTG CAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAG CTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGAC GAGCTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATCGAGCAGTGGGAGGAACTG ACCCAGAAGCGGAACGAGGAAAGCTGA gvpLATGGGCGAGCTGCTGTACCTGTACGGCCTG 435 ATCCCCACCAAAGAGGCCGCTGCCATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGC GAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGAC GCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTG CAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTACGAGGAATTCACCATC ATCCCCCTGAAGTTCTGCACCATCTATAAGGGCGAGGAATCCCTGCAGGCCGCCATCGAG ATCAACAAAGAGAAGATCGAGAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAAC GTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGC GTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAG AAGAAGATTGACCAGCTGATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAG GAAATCCACGACAAGCTGATTGAGCTGAGCCTGTACGACTCCGTGAAGAAGAACTGGTCC AAGGACGTGACCGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGC CTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTGCAGCAGCGGCTGGAAAACAAG GGCTGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCTGA gvpS ATGAGCCTGAAGCAGAGCATGGAAAACAAG 436GATATCGCCCTGATCGACATCCTGGACGTG ATCCTGGACAAGGGCGTGGCCATCAAGGGCGACCTGATCATCTCTATCGCCGGCGTGGAC CTGGTGTACCTGGACCTGAGAGTGCTGATCTCCAGCGTGGAAACCCTGGTGCAGGCCAAA GAGGGCAACCACAAGCCCATCACCAGCGAGCAGTTCGACAAGCAGAAAGAGGAACTGATG GACGCCACCGGCCAGCCCAGCAAGTGGACAAATCCTCTGGGCAGC gvpK ATGCAGCCCGTGTCCCAGGCCAACGGCAGA 437ATCCACCTGGATCCCGATCAGGCCGAACAG GGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCAC GCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATC GCCCTGATGAACCTGGAAGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCC GAGGACCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGTGA gvpJ ATGGCCGTGGAACACAACATGCAGAGCAGC 438ACCATCGTGGACGTGCTGGAAAAGATCCTG GACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTG ACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGAT TGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCCCTGGAAGAG GAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGC TGA gvpU ATGAGCACCGGCCCCAGCTTCAGCACCAAG439 GACAACACCCTGGAATACTTCGTGAAGGCC AGCAACAAGCACGGCTTCAGCCTGGACATCAGCCTGAACGTGAACGGGGCCGTGATCAGC GGCACCATGATCAGCGCCAAAGAGTACTTCGACTACCTGAGCGAGACATTCGAAGAGGGC AGCGAGGTGGCCCAGGCCCTGTCTGAGCAGTTTAGCCTGGCCAGCGAGGCCTCCGAGTCT AATGGCGAAGCCGAGGCCCACTTCATCCACCTGAAGAACACCAAGATCTACTGCGGCGAC AGCAAGAGCACCCCCAGCAAGGGCAAGATCTTCTGGCGCGGCAAGATCGCCGAGGTGGAC GGATTCTTCCTGGGAAAGATCAGCGACGCCAAGTCCACCAGCAAGAAGTCCAGCTGA

Each gene is cloned in pCMVSport plasmid which contains CMV promoterupstream of each gene and SV40 polyadenylation tail downstream of eachgene, as illustrated in FIG. 12B. The gene cassettes elements of thepCMVSport plasmid are reported in Table 11a below.

TABLE 11a Additional elements of the GVP cassettes SEQ ID ElementSequence NO: CMV CGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGAC 440enhancer/ CCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAA CMVTAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGC promoterCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAG GTCTATATAAGCAGAGCT SV40AACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCAC 441 polyadeny-AAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTC lationCAAACTCATCAATGTATCTTATCATGTCTGGATC tail

Example 6 Construction of a GVES Configured for Expression in MammalianCells

Using the genes of the exemplary B. megaterium cluster reported in Table11 above, the development of a synthetic mammalian operon with theminimum number of genes required to produce gas vesicles wasinvestigated.

For this, the Applicant turned to viral elements that have evolved toexploit the eukaryotic genetic machinery to allow for the expression ofmultiple genes from a single promoter (polycistronic gene expression).

The most common elements used the internal ribosomal entry sequence(IRES) and the 2A self-cleavage peptide [42]. Briefly, when placedbetween two genes the IRES region of the transcribed mRNA form asecondary structure that enables cap-independent ribosomal entry leadingto co-translation of the downstream gene.

Alternatively, by placing the 2A self-cleavage peptide element betweentwo genes, the resultant mRNA sequence causes a ‘ribosomal skip’ thatreleases the first protein and proceeds to translate the second protein.The 2A element has a smaller genetic footprint and higher co-expressionefficiency for the downstream gene compared with IRES, however, its useresults in n- and c-terminal modifications to the proteins.

To test if the gas vesicle genes could tolerate modifications due to theaddition of element 2A, additional experiments were performed reportedin the following Example 7.

Example 7: Identification of Tolerability B. megaterium Gene ClusterDetectable by TEM

To test if the gas vesicle genes could tolerate the N- and C-terminal 2Amodifications, the genes of the exemplary B. megaterium gene cluster ofExample 5 and Table 11 were modified.

In particular, the n-terminal proline and c-terminal 24 amino acid(GAPGSGATNFSLLKQAG-DVEENPG) (SEQ ID NO: 442) modification were tested inEscherichia coli using the bacterial gas vesicle gene cluster, accordingto the approach schematically illustrated in FIG. 8.

All genes except for the structural protein gas vesicle protein Btolerated the n- and c-terminal 2A modifications) as shown by theresults summarized in the following Table 12.

TABLE 12 Gene GVs after N-term addition? GVs after C-term addition? gvpB— No gvpR Yes Yes gvpN Yes Yes gvpF Yes Yes gvpG Yes Yes gvpL Yes YesgvpS Yes Yes gvpK Yes Yes gvpJ Yes Yes gvpT Yes Yes gvpU Yes Yes

In particular, the results of Table 12 above indicate tolerability ofP2A peptide additions to B. megaterium gas vesicle genes. Each gene ofthe B. megaterium gene cluster was modified with an N-terminal prolineafter the start codon or with a linker and P2A peptide at theC-terminus, resulting in a total of 21 unique GV gene clusters asillustrated in FIG. 8, E. coli were transformed with each plasmid andgas vesicles were induced for expression for a total of 22 hours andassayed for the presence of gas vesicles using TEM. The table indicateswhether gas vesicles were observed by TEM. Expression and TEM imagingperformed as in [43].

Example 8: Engineering of a GVPC Construct

An exemplary polynucleotide construct was provided including all thegenes of the GV gene cluster of B. megaterium reported in Table 11. AGVPC construct was therefore provided using the related GVA genesseparated by a separation elements encoding peptide 2A.

The sequence of this exemplary GVPC construct in which the gyp genes areincluded in a pCMVSport backbone is reported in Table 13 below. gyp N,F, G, L, S, K, J, U and EmGFP are separated by GAPGSG-p2A sequence.

TABLE 13 Exemplary GVPC construct SEQ ID Construct Sequence NO: CMV: gvpCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCC 443 NFGLSKJCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGG U-GACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTG EmGFP:GCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATG polyAACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATAGAAGACACCGGGACCGATCCAGCCTCCGGACTCTAGCCTAGGCTTTTGCAAAAAGCTATTTAGGTGACACTATAGAAGGTACGCCTGCAGGTACCGAGCTCGGATCCAGTACCCTTCACCATGACCGTGCTGACCGACAAGCGGAAGAAGGGCAGCGGCGCCTTCATCCAGGACGACGAGACAAAAGAGGTGCTGAGCAGAGCCCTGAGCTACCTGAAGTCCGGCTACAGCATCCACTTCACCGGACCTGCCGGCGGAGGCAAGACATCTCTGGCTAGAGCCCTGGCCAAGAAACGGAAGCGGCCCGTGATGCTGATGCACGGCAACCACGAGCTGAACAACAAGGACCTGATCGGCGATTTCACCGGCTACACCAGCAAAAAGGTGATCGACCAGTACGTGCGGAGCGTGTACAAGAAAGACGAACAGGTGTCCGAGAACTGGCAGGACGGCAGACTGCTGGAAGCCGTGAAGAATGGCTACACCCTGATCTACGACGAGTTCACCAGAAGCAAGCCCGCTACCAACAACATCTTCCTGAGCATCCTTGAGGAGGGCGTGCTGCCCCTGTACGGCGTGAAGATGACCGACCCTTTCGTGCGCGTGCACCCCGACTTCAGAGTGATCTTTACCAGCAACCCCGCCGAGTATGCCGGCGTGTACGATACCCAGGACGCCCTGCTGGACCGGCTGATCACCATGTTCATCGACTACAAGGACATCGACCGGGAAACCGCTATCCTGACCGAGAAAACTGACGTGGAAGAAGACGAGGCCCGGACCATCGTGACCCTGGTGGCCAACGTGCGGAACAGAAGCGGCGACGAGAATAGCAGCGGCCTGAGCCTGAGAGCCAGCCTGATGATTGCCACCCTGGCCACCCAGCAGGACATCCCTATCGATGGCAGCGACGAGGACTTCCAGACCCTGTGCATCGACATCCTGCACCACCCCCTGACCAAGTGCCTGGACGAAGAGAACGCCAAGAGCAAGGCCGAGAAGATCATTCTCGAAGAGTGCAAGAACATCGACACCGAGGAGAAGGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGAGCGAGACAAACGAGACAGGCATCTACATCTTCAGCGCCATCCAGACAGACAAGGATGAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCTGAGACATTCCTGATCCGGTATAAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAGAACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATCCCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTCCTGGAAAACCTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTGAAGGTGATCGGCAAGAAAGAGTGGCTCGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAAAAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAGCTGGGCGGCATGGCCCAGAAGATGTTCACAAGCCTGCAGAAAGAAGTGAAAACCGACGTGTTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAAACAATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGATGAGGCCAAGTTCGACGAGAAAGTCAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGGCCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCTGCACAAGCTCGTGACCGCCCCCATCAACCTGGTCGTGAAGATCGGCGAGAAGGTGCAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAGCTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGACGAACTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATTGAGCAGTGGGAAGAACTGACCCAGAAGCGGAATGAGGAAAGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGGCGAGCTGCTGTACCTCTACGGCCTGATCCCCACCAAAGAGGCCGCTGCTATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGCGAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGACGCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTGCAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTATGAGGAATTCACCATCATCCCCCTGAAGTTCTGCACCATCTATAAGGGAGAGGAATCCCTGCAGGCCGCCATCGAGATCAACAAAGAGAAGATCGAAAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAACGTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGCGTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAGAAGAAGATTGACCAGCTCATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAGGAAATCCACGACAAGCTGATTGAGCTGAGCCTCTACGACTCCGTGAAGAAGAACTGGTCCAAGGACGTGACAGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGCCTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTCCAGCAGCGGCTGGAGAACAAGGGATGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGAGCCTGAAGCAGAGCATGGAGAATAAGGATATCGCCCTGATCGACATCCTCGACGTGATCCTGGACAAGGGAGTGGCCATCAAGGGCGACCTGATCATCTCTATCGCCGGCGTGGACCTGGTGTACCTGGATCTGAGAGTGCTGATCTCCAGCGTGGAAACCCTGGTGCAGGCCAAAGAGGGCAACCACAAGCCCATCACCAGCGAGCAGTTCGACAAGCAGAAAGAGGAGCTGATGGACGCCACCGGCCAGCCCAGCAAGTGGACAAATCCTCTGGGCAGCGGCGCTCCCGGGTCAGGTGCCACGAATTTTTCGTTGTTGAAGCAAGCTGGGGATGTTGAAGAGAACCCAGGGCCTGTGCAGCCCGTGTCCCAGGCCAACGGCAGAATCCACCTGGATCCCGATCAGGCCGAACAGGGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCACGCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATCGCTCTGATGAACCTGGAGGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCTGAGGATCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGCCGTGGAACACAACATGCAGAGCAGCACCATCGTGGACGTGCTGGAAAAGATCCTGGACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTGACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGATTGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCTCTGGAAGAGGAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGAGCACCGGCCCCAGCTTCAGCACCAAGGACAACACCCTGGAATACTTCGTGAAGGCCAGCAACAAGCACGGCTTTAGCCTCGACATCAGCCTGAACGTGAATGGGGCCGTGATTAGCGGCACCATGATCAGCGCCAAAGAGTACTTCGACTACCTGAGCGAGACATTCGAAGAGGGCAGCGAAGTGGCCCAGGCCCTGTCTGAGCAGTTTAGCCTGGCTAGCGAGGCCTCCGAGTCTAATGGCGAAGCCGAGGCCCACTTCATCCACCTGAAGAACACCAAGATCTACTGCGGCGACAGCAAGAGCACCCCCAGCAAGGGCAAGATCTTCTGGCGCGGCAAGATCGCCGAGGTGGACGGATTCTTCCTGGGAAAAATCAGCGACGCCAAGTCCACCAGCAAGAAGTCCAGCGGCGCTCCCGGGTCAGGTGCCACGAATTTTTCGTTGTTGAAGCAAGCTGGGGATGTTGAAGAGAACCCAGGGCCTGTGGTGTCCAAGGGCGAGGAACTGTTCACCGGCGTGGTGCCCATCCTGGTGGAACTGGATGGCGACGTGAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAAGGCGACGCCACATACGGAAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCTTGGCCTACCCTCGTGACCACACTGACCTACGGCGTGCAGTGCTTCGCCAGATACCCCGACCACATGAAGCAGCACGATTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAACGGACCATCTTCTTCAAGGACGACGGCAACTACAAGACAAGAGCCGAAGTGAAGTTCGAGGGCGACACCCTCGTGAACCGGATCGAGCTGAAGGGCATCGACTTCAAAGAGGATGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGCCACAAGGTGTACATCACCGCCGACAAGCAGAAAAACGGCATCAAAGTGAACTTCAAGACCCGGCACAACATCGAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGAGATGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACACAAAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGGGACCACATGGTGCTGCTGGAATTTGTGACCGCCGCTGGCATCACCCTGGGCATGGACGAGCTGTACAAGTGACTCGAGTCTAGAGGGCCCCGTGGCTGTAATCTAGAGGATCCCTCGAGGGGCCCAAGCTTACGCGTGCATGCGACGTCATAGCTCTCTCCCTATAGTGAGTCGTATTATAAGCTAGCTTGGGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGACAAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAAGTGTATAATGTGTTAAACTAGCTGCATATGCTTGCTGCTTGAGAGTTTTGCTTACTGAGTATGATTTATGAAAATATTATACACAGGAGCTAGTGATTCTAATTGTTTGTGTATTTTAGATTCACAGTCCCAAGGCTCATTTCAGGCCCCTCAGTCCTCACAGTCTGTTCATGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCAT GTCTGGATC

The DNA sequence for the CMV enhancer/CMV promoter used is) and the DNAsequence for SV40 polyadenylation tail used are the same reported inTable 11a above.

Example 9: Identification of Detectable Gene Clusters in Mammalian Cells

To identify a set of genes capable of assembling gas vesicles in themammalian cell, an exemplary GVES was constructed using the exemplary GVgene cluster from B. megaterium reported in Table 11 above, which can beused as a Gas Vesicle Reporting Component as will be understood by askilled person upon review of the instant disclosure.

A transient transfection screening assay was performed to allow thetesting of different gas vesicle gene clusters without the need tooptimize their stoichiometry and expression levels individually;although from the previous work these are expected to be importantparameters.

In particular, a cell culture, transient transfection of HEK 293T andCHO-K1 cells and TEM analysis were performed as described in thematerial and method with various genes cluster.

An exemplary GV cluster the gvp genes of nine B. megaterium of Table 11above was shown to be detectable by TEM and BURST ultrasound.

In particular, a monocistronic GVES with the nine B. megaterium of Table11 was used in the experiments illustrated in FIGS. 9 and 12.

Example 10: Identification of Bottleneck Genes in Mammalian Cells toEnable Robust GV Formation in Mammalian Cells

Genes having a lower expression rate in GV constructs herein described(herein also indicated as bottleneck genes) were identified in exemplarymammalian cells HEK293T cells using an experimental approach illustratedin FIGS. 9A-9C.

In particular, test the efficiency with which gas vesicles could beformed when a given gene was supplied only on the polycistronic plasmid,and thereby identify “bottleneck” genes, the HEK293T cells wereco-transfected with a monocistronic plasmid containing gvpB, 7 othermonocistronic plasmids including all but the gene being assayed, and thepolycistronic plasmid (for example Table 13) according to the approachschematically illustrated in FIG. 9A.

A qualitative estimate of the relative number of gas vesicles producedwhen each indicated gene was supplied solely by the polycistronicplasmid is reported in FIG. 9B, and representative TEM images of gasvesicles in the lysate of HEK293T cells for all 8 assays are shown inFIG. 9C.

These results suggest that gvpN, gvpS and gvpU supplied in eithermonocistronic or polycistronic form supported abundant gas vesicleassembly. However, the production of gas vesicles was significantlyreduced when gvpJ, gvpF, gvpG, gvpL or gvpK was supplied from thepolycistronic vector. Therefore, these results supported the conclusionthat these genes represented a bottleneck in gas vesicle formation forthe tested GV cluster.

Example 11; Optimization of Gene Stoichiometry Through Booster Construct

In order to address the stoichiometry issues raised by bottleneck genesin the exemplary B megatherium cluster identified in Example 12 abooster plasmid comprising duplicate cassettes for the bottleneck geneswas provided.

In particular, a booster plasmid containing gvp genes J, F, G, L and Kconnected with p2A elements was constructed to elevate the expression ofthese genes in a pCMVSport backbone.

The related sequence is reported in Table 14 below. gvpJ, F, G, L, K areseparated by GAPGSG-p2A sequence.

TABLE 14 Exemplary GPVC booster construct SEQ ID Construct Sequence NOCMV: gvp CGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCC 444 JFGLK:CCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGG polyAGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATAGAAGACACCGGGACCGATCCAGCCTCCGGACTCTAGCCTAGGCTTTTGCAAAAAGCTATTTAGGTGACACTATAGAAGGTACGCCTGCAGGTACCGAGCTCGGATCCAGTACCCTTCACCATGGCCGTGGAACACAACATGCAGAGCAGCACCATCGTGGACGTGCTGGAAAAGATCCTGGACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTGACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGATTGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCCCTGGAAGAGGAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGAGCGAGACAAACGAGACAGGCATCTACATCTTCAGCGCCATCCAGACAGACAAGGATGAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCTGAGACATTCCTGATCCGGTATAAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAGAACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATCCCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTCCTGGAAAACCTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTGAAGGTGATCGGCAAGAAAGAGTGGCTCGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAAAAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAGCTGGGCGGCATGGCCCAGAAGATGTTCACAAGCCTGCAGAAAGAAGTGAAAACCGACGTGTTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAAACAATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGATGAGGCCAAGTTCGACGAGAAAGTCAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGGCCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCTGCACAAGCTCGTGACCGCCCCCATCAACCTGGTCGTGAAGATCGGCGAGAAGGTGCAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAGCTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGACGAACTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATTGAGCAGTGGGAAGAACTGACCCAGAAGCGGAATGAGGAAAGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGGCGAGCTGCTGTACCTCTACGGCCTGATCCCCACCAAAGAGGCCGCTGCTATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGCGAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGACGCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTGCAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTATGAGGAATTCACCATCATCCCCCTGAAGTTCTGCACCATCTATAAGGGAGAGGAATCCCTGCAGGCCGCCATCGAGATCAACAAAGAGAAGATCGAAAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAACGTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGCGTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAGAAGAAGATTGACCAGCTCATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAGGAAATCCACGACAAGCTGATTGAGCTGAGCCTCTACGACTCCGTGAAGAAGAACTGGTCCAAGGACGTGACAGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGCCTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTCCAGCAGCGGCTGGAGAACAAGGGATGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCAGCCCGTGTCCCAGGCCAACGGCAGAATCCACCTGGATCCCGATCAGGCCGAACAGGGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCACGCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATCGCCCTGATGAACCTGGAAGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCCGAGGACCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGTGATAATCTAGAGGATCCCTCGAGGGGCCCAAGCTTACGCGTGCATGCGACGTCATAGCTCTCTCCCTATAGTGAGTCGTATTATAAGCTAGCTTGGGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGACAAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAAGTGTATAATGTGTTAAACTAGCTGCATATGCTTGCTGCTTGAGAGTTTTGCTTACTGAGTATGATTTATGAAAATATTATACACAGGAGCTAGTGATTCTAATTGTTTGTGTATTTTAGATTCACAGTCCCAAGGCTCATTTCAGGCCCCTCAGTCCTCACAGTCTGTTCATGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTAT CTTATCATGTCTGGATC

The DNA sequence for the CMV enhancer/CMV promoter used and the DNAsequence for SV40 polyadenylation tail used are the same reported inTable 11a above.

Example 12: Gas Vesicle Expression System

The GVES that includes GVPB gene expression cassette of Table 11 withthe GVPC construct of Table 13 and the GVP booster plasmid of Table 14,illustrated in FIG. 12D, is able to robustly express GVs in mammaliancells as detected by TEM and BURST ultrasound. The sequences of thecorresponding exemplary GVES herein are also indicated as mARG.

The GVES of this example provide a polycistronic GVES which was used inthe experiments illustrated in FIGS. 9A-9C and 12 have been collectedusing GVES described in Example 9 for monocistronic cassettes andExample 12A for polycistronic cassettes.

Example 13: Gas Vesicle Expression System

The mARG GVES can be cloned within the piggyBac backbone are reported inTables 15, 16 and 17 below, as illustrated in FIG. 13A, for integrationin the genome of mammalian cells.

TABLE 15 Construct comprising the GVPB cassette Construct SEQUENCESEQ ID NO: Piggybac CCCTAGAAAGATAATCATATTGTGACGTACGTTAAAGATAAT 445transposon CATGTGTAAAATTGACGCATGTGTTTTATCGGTCTGTATATCG containingAGGTTTATTTATTAATTTGAATAGATATTAAGTTTTATTATATT gvpBTACACTTACATACTAATAATAAATTCAACAAACAATTTATTTATGTTTATTTATTTATTAAAAAAAACAAAAACTCAAAATTTCTTCTATAAAGTAACAAAACTTTTATGAGGGACAGCCCCCCCCCAAAGCCCCCAGGGATGTAATTACGTCCCTCCCCCGCTAGGGGGCAGCAGCGAGCCGCCCGGGGCTCCGCTCCGGTCCGGCGCTCCCCCCGCATCCCCGAGCCGGCAGCGTGCGGGGACAGCCCGGGCACGGGGAAGGTGGCACGGGATCGCTTTCCTCTGAACGCTTCTCGCTGCTCTTTGAGCCTGCAGACACCTGGGGGGATACGGGGAAAAGGCCTCCACGGCCACTAGTTTCACTCGAGTTTACTCCCTATCAGTGATAGAGAACGTATGAAGAGTTTACTCCCTATCAGTGATAGAGAACGTATGCAGACTTTACTCCCTATCAGTGATAGAGAACGTATAAGGAGTTTACTCCCTATCAGTGATAGAGAACGTATGACCAGTTTACTCCCTATCAGTGATAGAGAACGTATCTACAGTTTACTCCCTATCAGTGATAGAGAACGTATATCCAGTTTACTCCCTATCAGTGATAGAGAACGTATGTCGAGGTAGGCGTGTACGGTGGGCGCCTATAAAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGCAATTCCACAACACTTTTGTCTTATACTTGGTACCTATGCATGCCACCATGAGCATCCAGAAGTCCACCAACAGCAGCAGCCTGGCCGAAGTGATCGACCGGATCCTGGACAAGGGCATCGTGATCGACGCCTTCGCCAGAGTGTCCGTCGTGGGCATCGAGATCCTGACCATCGAGGCCAGAGTCGTGATCGCCAGCGTGGACACCTGGCTGAGATATGCCGAAGCCGTGGGCCTGCTGCGGGACGACGTGGAAGAAAATGGCCTGCCCGAGCGGAGCAACAGCTCTGAGGGACAGCCCCGGTTCAGCATCTGAACTAAATCGCACTGTCGGCGTCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACACGATGATAATATGGCCACAACCATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGCGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGTGAACTAGTTCGTTAACTAAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGGAATTGACTCAAATGATGTCAATTAGTCTATCAGAAGCTCATCTGGTCTCCCTTCCGGGGGACAAGACATCCCTGTTTAATATTTAAACAGCAGTGTTCCCAAACTGGGTTCTTATATCCCTTGCTCTGGTCAACCAGGTTGCAGGGTTTCCTGTCCTCACAGGAACGAAGTCCCTAAAGAAACAGTGGCAGCCAGGTTTAGCCCCGGAATTGACTGGATTCCTTTTTTAGGGCCCATTGGTATGGCTTTTTCCCCGTATCCCCCCAGGTGTCTGCAGGCTCAAAGAGCAGCGAGAAGCGTTCAGAGGAAAGCGATCCCGTGCCACCTTCCCCGTGCCCGGGCTGTCCCCGCACGCTGCCGGCTCGGGGATGCGGGGGGAGCGCCGGACCGGAGCGGAGCCCCGGGCGGCTCGCTGCTGCCCCCTAGCGGGGGAGGGACGTAATTACATCCCTGGGGGCTTTGGGGGGGGGCTGTCCCTGATATCTATAACAAGAAAATATATATATAATAAGTTATCACGTAAGTAGAACATGAAATAACAATATAATTATCGTATGAGTTAAATCTTAAAAGTCACGTAAAAGATAATCATGCGTCATTTTGACTCACGCGGTCGTTATAGTTCAAAATCAGTGACACTTACCGCATTGACAAGCACGCCTCACGGGAGCTCCAAGCGGCGACTGAGATGTCCTAAATGCACAGCGACGGATTCGCGCTATTTAGAAAGAGAGAGCAATATTTCAAGAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGG

TABLE 16 GVPC construct comprising one additional GVP cassette ConstructSEQUENCE SEQ ID NO: PiggyBac CCCTAGAAAGATAATCATATTGTGACGTACGTTAAAGATAAT446 transposon CATGTGTAAAATTGACGCATGTGTTTTATCGGTCTGTATATCG containingAGGTTTATTTATTAATTTGAATAGATATTAAGTTTTATTATATT gvpNFGLSTACACTTACATACTAATAATAAATTCAACAAACAATTTATTTA KJU-TGTTTATTTATTTATTAAAAAAAACAAAAACTCAAAATTTCTT EmGFPCTATAAAGTAACAAAACTTTTATGAGGGACAGCCCCCCCCCAAAGCCCCCAGGGATGTAATTACGTCCCTCCCCCGCTAGGGGGCAGCAGCGAGCCGCCCGGGGCTCCGCTCCGGTCCGGCGCTCCCCCCGCATCCCCGAGCCGGCAGCGTGCGGGGACAGCCCGGGCACGGGGAAGGTGGCACGGGATCGCTTTCCTCTGAACGCTTCTCGCTGCTCTTTGAGCCTGCAGACACCTGGGGGGATACGGGGAAAAGGCCTCCACGGCCACTAGTTTCACTCGAGTTTACTCCCTATCAGTGATAGAGAACGTATGAAGAGTTTACTCCCTATCAGTGATAGAGAACGTATGCAGACTTTACTCCCTATCAGTGATAGAGAACGTATAAGGAGTTTACTCCCTATCAGTGATAGAGAACGTATGACCAGTTTACTCCCTATCAGTGATAGAGAACGTATCTACAGTTTACTCCCTATCAGTGATAGAGAACGTATATCCAGTTTACTCCCTATCAGTGATAGAGAACGTATGTCGAGGTAGGCGTGTACGGTGGGCGCCTATAAAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGCAATTCCACAACACTTTTGTCTTATACTTGGTACCTATGCATGCCACCATGACCGTGCTGACCGACAAGCGGAAGAAGGGCAGCGGCGCCTTCATCCAGGACGACGAGACAAAAGAGGTGCTGAGCAGAGCCCTGAGCTACCTGAAGTCCGGCTACAGCATCCACTTCACCGGACCTGCCGGCGGAGGCAAGACATCTCTGGCTAGAGCCCTGGCCAAGAAACGGAAGCGGCCCGTGATGCTGATGCACGGCAACCACGAGCTGAACAACAAGGACCTGATCGGCGATTTCACCGGCTACACCAGCAAAAAGGTGATCGACCAGTACGTGCGGAGCGTGTACAAGAAAGACGAACAGGTGTCCGAGAACTGGCAGGACGGCAGACTGCTGGAAGCCGTGAAGAATGGCTACACCCTGATCTACGACGAGTTCACCAGAAGCAAGCCCGCTACCAACAACATCTTCCTGAGCATCCTTGAGGAGGGCGTGCTGCCCCTGTACGGCGTGAAGATGACCGACCCTTTCGTGCGCGTGCACCCCGACTTCAGAGTGATCTTTACCAGCAACCCCGCCGAGTATGCCGGCGTGTACGATACCCAGGACGCCCTGCTGGACCGGCTGATCACCATGTTCATCGACTACAAGGACATCGACCGGGAAACCGCTATCCTGACCGAGAAAACTGACGTGGAAGAAGACGAGGCCCGGACCATCGTGACCCTGGTGGCCAACGTGCGGAACAGAAGCGGCGACGAGAATAGCAGCGGCCTGAGCCTGAGAGCCAGCCTGATGATTGCCACCCTGGCCACCCAGCAGGACATCCCTATCGATGGCAGCGACGAGGACTTCCAGACCCTGTGCATCGACATCCTGCACCACCCCCTGACCAAGTGCCTGGACGAAGAGAACGCCAAGAGCAAGGCCGAGAAGATCATTCTCGAAGAGTGCAAGAACATCGACACCGAGGAGAAGGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGAGCGAGACAAACGAGACAGGCATCTACATCTTCAGCGCCATCCAGACAGACAAGGATGAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCTGAGACATTCCTGATCCGGTATAAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAGAACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATCCCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTCCTGGAAAACCTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTGAAGGTGATCGGCAAGAAAGAGTGGCTCGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAAAAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAGCTGGGCGGCATGGCCCAGAAGATGTTCACAAGCCTGCAGAAAGAAGTGAAAACCGACGTGTTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAAACAATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGATGAGGCCAAGTTCGACGAGAAAGTCAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGGCCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCTGCACAAGCTCGTGACCGCCCCCATCAACCTGGTCGTGAAGATCGGCGAGAAGGTGCAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAGCTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGACGAACTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATTGAGCAGTGGGAAGAACTGACCCAGAAGCGGAATGAGGAAAGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGGCGAGCTGCTGTACCTCTACGGCCTGATCCCCACCAAAGAGGCCGCTGCTATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGCGAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGACGCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTGCAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTATGAGGAATTCACCATCATCCCCCTGAAGTTCTGCACCATCTATAAGGGAGAGGAATCCCTGCAGGCCGCCATCGAGATCAACAAAGAGAAGATCGAAAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAACGTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGCGTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAGAAGAAGATTGACCAGCTCATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAGGAAATCCACGACAAGCTGATTGAGCTGAGCCTCTACGACTCCGTGAAGAAGAACTGGTCCAAGGACGTGACAGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGCCTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTCCAGCAGCGGCTGGAGAACAAGGGATGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGAGCCTGAAGCAGAGCATGGAGAATAAGGATATCGCCCTGATCGACATCCTCGACGTGATCCTGGACAAGGGAGTGGCCATCAAGGGCGACCTGATCATCTCTATCGCCGGCGTGGACCTGGTGTACCTGGATCTGAGAGTGCTGATCTCCAGCGTGGAAACCCTGGTGCAGGCCAAAGAGGGCAACCACAAGCCCATCACCAGCGAGCAGTTCGACAAGCAGAAAGAGGAGCTGATGGACGCCACCGGCCAGCCCAGCAAGTGGACAAATCCTCTGGGCAGCGGCGCTCCCGGGTCAGGTGCCACGAATTTTTCGTTGTTGAAGCAAGCTGGGGATGTTGAAGAGAACCCAGGGCCTGTGCAGCCCGTGTCCCAGGCCAACGGCAGAATCCACCTGGATCCCGATCAGGCCGAACAGGGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCACGCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATCGCTCTGATGAACCTGGAGGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCTGAGGATCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGCCGTGGAACACAACATGCAGAGCAGCACCATCGTGGACGTGCTGGAAAAGATCCTGGACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTGACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGATTGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCTCTGGAAGAGGAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGAGCACCGGCCCCAGCTTCAGCACCAAGGACAACACCCTGGAATACTTCGTGAAGGCCAGCAACAAGCACGGCTTTAGCCTCGACATCAGCCTGAACGTGAATGGGGCCGTGATTAGCGGCACCATGATCAGCGCCAAAGAGTACTTCGACTACCTGAGCGAGACATTCGAAGAGGGCAGCGAAGTGGCCCAGGCCCTGTCTGAGCAGTTTAGCCTGGCTAGCGAGGCCTCCGAGTCTAATGGCGAAGCCGAGGCCCACTTCATCCACCTGAAGAACACCAAGATCTACTGCGGCGACAGCAAGAGCACCCCCAGCAAGGGCAAGATCTTCTGGCGCGGCAAGATCGCCGAGGTGGACGGATTCTTCCTGGGAAAAATCAGCGACGCCAAGTCCACCAGCAAGAAGTCCAGCGGCGCTCCCGGGTCAGGTGCCACGAATTTTTCGTTGTTGAAGCAAGCTGGGGATGTTGAAGAGAACCCAGGGCCTGTGGTGTCCAAGGGCGAGGAACTGTTCACCGGCGTGGTGCCCATCCTGGTGGAACTGGATGGCGACGTGAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAAGGCGACGCCACATACGGAAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCTTGGCCTACCCTCGTGACCACACTGACCTACGGCGTGCAGTGCTTCGCCAGATACCCCGACCACATGAAGCAGCACGATTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAACGGACCATCTTCTTCAAGGACGACGGCAACTACAAGACAAGAGCCGAAGTGAAGTTCGAGGGCGACACCCTCGTGAACCGGATCGAGCTGAAGGGCATCGACTTCAAAGAGGATGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGCCACAAGGTGTACATCACCGCCGACAAGCAGAAAAACGGCATCAAAGTGAACTTCAAGACCCGGCACAACATCGAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGAGATGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACACAAAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGGGACCACATGGTGCTGCTGGAATTTGTGACCGCCGCTGGCATCACCCTGGGCATGGACGAGCTGTACAAGTGAACTAGTTCGTTAACTAAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGGAATTGACTCAAATGATGTCAATTAGTCTATCAGAAGCTCATCTGGTCTCCCTTCCGGGGGACAAGACATCCCTGTTTAATATTTAAACAGCAGTGTTCCCAAACTGGGTTCTTATATCCCTTGCTCTGGTCAACCAGGTTGCAGGGTTTCCTGTCCTCACAGGAACGAAGTCCCTAAAGAAACAGTGGCAGCCAGGTTTAGCCCCGGAATTGACTGGATTCCTTTTTTAGGGCCCATTGGTATGGCTTTTTCCCCGTATCCCCCCAGGTGTCTGCAGGCTCAAAGAGCAGCGAGAAGCGTTCAGAGGAAAGCGATCCCGTGCCACCTTCCCCGTGCCCGGGCTGTCCCCGCACGCTGCCGGCTCGGGGATGCGGGGGGAGCGCCGGACCGGAGCGGAGCCCCGGGCGGCTCGCTGCTGCCCCCTAGCGGGGGAGGGACGTAATTACATCCCTGGGGGCTTTGGGGGGGGGCTGTCCCTGATATCTATAACAAGAAAATATATATATAATAAGTTATCACGTAAGTAGAACATGAAATAACAATATAATTATCGTATGAGTTAAATCTTAAAAGTCACGTAAAAGATAATCATGCGTCATTTTGACTCACGCGGTCGTTATAGTTCAAAATCAGTGACACTTACCGCATTGACAAGCACGCCTCACGGGAGCTCCAAGCGGCGACTGAGATGTCCTAAATGCACAGCGACGGATTCGCGCTATTTAGAAAGAGAGAGCAATATTTCAAGAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGG

TABLE 17 Exemplary Booster Construct Construct SEQUENCE SEQ ID NO:PiggyBac CCCTAGAAAGATAATCATATTGTGACGTACGTTAAAGATAAT 447 transposonCATGTGTAAAATTGACGCATGTGTTTTATCGGTCTGTATATCG containingAGGTTTATTTATTAATTTGAATAGATATTAAGTTTTATTATATT gvpJFGLKTACACTTACATACTAATAATAAATTCAACAAACAATTTATTTATGTTTATTTATTTATTAAAAAAAACAAAAACTCAAAATTTCTTCTATAAAGTAACAAAACTTTTATGAGGGACAGCCCCCCCCCAAAGCCCCCAGGGATGTAATTACGTCCCTCCCCCGCTAGGGGGCAGCAGCGAGCCGCCCGGGGCTCCGCTCCGGTCCGGCGCTCCCCCCGCATCCCCGAGCCGGCAGCGTGCGGGGACAGCCCGGGCACGGGGAAGGTGGCACGGGATCGCTTTCCTCTGAACGCTTCTCGCTGCTCTTTGAGCCTGCAGACACCTGGGGGGATACGGGGAAAAGGCCTCCACGGCCACTAGTTTTCCCCGAAAAGTGCCACCTGACGTCGGCAGTGAAAAAAATGCTTTATTTGTGAAATTTGTGATGCTATTGCTTTATTTGTAACCATTATAAGCTGCAATAAACAAGTTAACAACAACAATTGCATTCATTTTATGTTTCAGGTTCAGGGGGAGGTGTGGGAGGTTTTTTAAAGCAAGTAAAACCTCTACAAATGTGGTATGGCTGATTATGATCCTCTAGACATATGCTGCAGTCACTTGTACAGCTCATCCATGCCCAGGGTGATGCCAGCGGCGGTCCGAAATTCCAGCAGCACCATGTGGTCCCGCTTCTCGTTGGGGTCCTTGCTCAGCACGCTCTGGGTGCTCAGGTAGTGGCTATCAGGCAGCAGCACGGGGCCATCTCCGATGGGGGTGTTCTGCTGGTAGTGGTCGGCCAGCTGCACGCTGCCATCTTCCACGTTGTGCCGGATCTTGAAGTTCACTTTGATGCCGTTTTTCTGCTTCACGGCCATGATGTAGATGTTGTGGCTGTTGAAGTTGTACTCCAGCTTGTGGCCCAGGATGTTGCCGTCCTCTTTGAAGTCCACGCCCTTCAGCTCGATCCGGTTCACGAGGGTGTCGCCCTCGAACTTCACTTCGGCTCTGGTCTTGTAGGTGCCGTCGTCCTTGAAGAAGATGGTCCGTTCCTGCACGTAGCCCTCGGGCATGGCGCTCTTGAAGAAATCGTGCTGCTTCATGTGGTCGGGGTATCTGGCGAAGCACTGCACGCCGTGAGACAGTGTGGTCACGAGGGTAGGCCAAGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAGGGTCAGCTTGCCATTTGTGGCGTCGCCTTCGCCCTCTCCCCGCACAGAGAACTTGTGGCCGTTCACGTCGCCATCCAGTTCCACCAGGATGGGCACCACGCCGGTGAACAGTTCCTCGCCCTTGGACACCATGGTGAAGGGTACTGGATCCGAGCTCGGTACCTGCAGGCGTACCTTCTATAGTGTCACCTAAATGCGATCTGACGGTTCACTAAACGAGCTCTGCTTATATAGGCCTCCCACCGTACACGCCACCTCGACATACTCGAGTTTACTCCCTATCAGTGATAGAGAACGTATGAAGAGTTTACTCCCTATCAGTGATAGAGAACGTATGCAGACTTTACTCCCTATCAGTGATAGAGAACGTATAAGGAGTTTACTCCCTATCAGTGATAGAGAACGTATGACCAGTTTACTCCCTATCAGTGATAGAGAACGTATCTACAGTTTACTCCCTATCAGTGATAGAGAACGTATATCCAGTTTACTCCCTATCAGTGATAGAGAACGTATGTCGAGGTAGGCGTGTACGGTGGGCGCCTATAAAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGCAATTCCACAACACTTTTGTCTTATACTTGGTACCTATGCATGCCACCATGGCCGTGGAACACAACATGCAGAGCAGCACCATCGTGGACGTGCTGGAAAAGATCCTGGACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTGACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGATTGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCCCTGGAAGAGGAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGAGCGAGACAAACGAGACAGGCATCTACATCTTCAGCGCCATCCAGACAGACAAGGATGAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCTGAGACATTCCTGATCCGGTATAAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAGAACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATCCCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTCCTGGAAAACCTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTGAAGGTGATCGGCAAGAAAGAGTGGCTCGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAAAAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAGCTGGGCGGCATGGCCCAGAAGATGTTCACAAGCCTGCAGAAAGAAGTGAAAACCGACGTGTTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAAACAATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGATGAGGCCAAGTTCGACGAGAAAGTCAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGGCCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCTGCACAAGCTCGTGACCGCCCCCATCAACCTGGTCGTGAAGATCGGCGAGAAGGTGCAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAGCTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGACGAACTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATTGAGCAGTGGGAAGAACTGACCCAGAAGCGGAATGAGGAAAGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGGCGAGCTGCTGTACCTCTACGGCCTGATCCCCACCAAAGAGGCCGCTGCTATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGCGAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGACGCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTGCAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTATGAGGAATTCACCATCATCCCCCTGAAGTTCTGCACCATCTATAAGGGAGAGGAATCCCTGCAGGCCGCCATCGAGATCAACAAAGAGAAGATCGAAAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAACGTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGCGTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAGAAGAAGATTGACCAGCTCATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAGGAAATCCACGACAAGCTGATTGAGCTGAGCCTCTACGACTCCGTGAAGAAGAACTGGTCCAAGGACGTGACAGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGCCTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTCCAGCAGCGGCTGGAGAACAAGGGATGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCAGCCCGTGTCCCAGGCCAACGGCAGAATCCACCTGGATCCCGATCAGGCCGAACAGGGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCACGCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATCGCCCTGATGAACCTGGAAGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCCGAGGACCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGTGAACTAGTTCGATACCGTCGACCGTTAACTAAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGGAATTGACTCAAATGATGTCAATTAGTCTATCAGAAGCTCATCTGGTCTCCCTTCCGGGGGACAAGACATCCCTGTTTAATATTTAAACAGCAGTGTTCCCAAACTGGGTTCTTATATCCCTTGCTCTGGTCAACCAGGTTGCAGGGTTTCCTGTCCTCACAGGAACGAAGTCCCTAAAGAAACAGTGGCAGCCAGGTTTAGCCCCGGAATTGACTGGATTCCTTTTTTAGGGCCCATTGGTATGGCTTTTTCCCCGTATCCCCCCAGGTGTCTGCAGGCTCAAAGAGCAGCGAGAAGCGTTCAGAGGAAAGCGATCCCGTGCCACCTTCCCCGTGCCCGGGCTGTCCCCGCACGCTGCCGGCTCGGGGATGCGGGGGGAGCGCCGGACCGGAGCGGAGCCCCGGGCGGCTCGCTGCTGCCCCCTAGCGGGGGAGGGACGTAATTACATCCCTGGGGGCTTTGGGGGGGGGCTGTCCCTGATATCTATAACAAGAAAATATATATATAATAAGTTATCACGTAAGTAGAACATGAAATAACAATATAATTATCGTATGAGTTAAATCTTAAAAGTCACGTAAAAGATAATCATGCGTCATTTTGACTCACGCGGTCGTTATAGTTCAAAATCAGTGACACTTACCGCATTGACAAGCACGCCTCACGGGAGCTCCAAGCGGCGACTGAGATGTCCTAAATGCACAGCGACGGATTCGCGCTATTTAGAAAGAGAGAGCAATATTTCAAGAATGCATGCGTCAATTT TACGCAGACTATCTTTCTAGGG

The DNA sequence for the additional regulatory regions of the cassettesare reported in Table 18 below.

TABLE 18 Additional elements of GV gene expression cassettes ElementSequence SEQ ID NO 5′ITR CCCTAGAAAGATAATCATATTGTGACGTACGTTAAAGATAAT 448CATGTGTAAAATTGACGCATGTGTTTTATCGGTCTGTATATCGAGGTTTATTTATTAATTTGAATAGATATTAAGTTTTATTATATTTACACTTACATACTAATAATAAATTCAACAAACAATTTATTTATGTTTATTTATTTATTAAAAAAAACAAAAACTCAAAATT TCTTCTATAAAGTAACAAAACTTTTA 5′GAGGGACAGCCCCCCCCCAAAGCCCCCAGGGATGTAATTACGTC 449 insulatorCCTCCCCCGCTAGGGGGCAGCAGCGAGCCGCCCGGGGCTCCGCT elementCCGGTCCGGCGCTCCCCCCGCATCCCCGAGCCGGCAGCGTGCGGGGACAGCCCGGGCACGGGGAAGGTGGCACGGGATCGCTTTCCTCTGAACGCTTCTCGCTGCTCTTTGAGCCTGCAGACACCTGGGGGGAT ACGGGGAAAA TRE3GGAGTTTACTCCCTATCAGTGATAGAGAACGTATGAAGAGTTTACT 450 promoterCCCTATCAGTGATAGAGAACGTATGCAGACTTTACTCCCTATCAGTGATAGAGAACGTATAAGGAGTTTACTCCCTATCAGTGATAGAGAACGTATGACCAGTTTACTCCCTATCAGTGATAGAGAACGTATCTACAGTTTACTCCCTATCAGTGATAGAGAACGTATATCCAGTTTACTCCCTATCAGTGATAGAGAACGTATGTCGAGGTAGGCGTGTACGGTGGGCGCCTATAAAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGCAATTCCACAACACTTTTGTCTTATACTT SV40AACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGC 451 polyadeny-ATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTT lationGTGGTTTGTCCAAACTCATCAATGTATCTTA tail 3′TTTTCCCCGTATCCCCCCAGGTGTCTGCAGGCTCAAAGAGCAGCG 452 insulatorAGAAGCGTTCAGAGGAAAGCGATCCCGTGCCACCTTCCCCGTGCC elementCGGGCTGTCCCCGCACGCTGCCGGCTCGGGGATGCGGGGGGAGCGCCGGACCGGAGCGGAGCCCCGGGCGGCTCGCTGCTGCCCCCTAGCGGGGGAGGGACGTAATTACATCCCTGGGGGCTTTGGGGGGGG GCTGTCCCT 3′ IRTGATATCTATAACAAGAAAATATATATATAATAAGTTATCACGTAA 453GTAGAACATGAAATAACAATATAATTATCGTATGAGTTAAATCTTAAAAGTCACGTAAAAGATAATCATGCGTCATTTTGACTCACGCGGTCGTTATAGTTCAAAATCAGTGACACTTACCGCATTGACAAGCACGCCTCACGGGAGCTCCAAGCGGCGACTGAGATGTCCTAAATGCACAGCGACGGATTCGCGCTATTTAGAAAGAGAGAGCAATATTTCAAGAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGG

The GVES exemplified here has been used in the experiments illustratedin FIGS. 11, and 13-24.

Example 14: Identification of Cassettes Resulting in Expression of GV inMammalian Cells

Experiments were performed that can be used to identify the elements ofa cassette for the expression of GV genes in mammalian cells inclusiveof regulatory genes and gene configuration with the GVES and regulatoryregions reported in Example 13 above.

A first set of experiments was performed to identify the features of anexemplary genetic construct to be used to express exemplary GV genes ina mammalian cell.

In particular a genetic construct was provided configured to obtainstable genomic integration of mCherry in HEK-293 cells. The constructschematically shown in FIG. 10A contained a 5′ITR for piggyBactransposase, chicken beta-globin insulator, TRE3G promoter upstream ofthe mCherry sequence and SV40 polyadenylation element downstream,followed by a chicken beta-globin insulator and 3′ITR for piggyBactransposase.

HEK-293 cells were transfected with the construct of FIG. 10A andplasmid encoding the piggyBac transposase, and subjected to FACS. Andthe genomic integration of the construct was detected as reported inFIG. 10B.

The regulatory regions of the above construct were therefore used toexpress the exemplary GVES of Example 13 herein also indicated as mARG,and in particular the three constructs were provided using theregulatory sequences tested in FIGS. 10A-10B, one including a GVPBcassette and one including a GPC construct as shown in FIG. 11 panel A.

The constructs of FIG. 11 panel A were used to generate polyclonal cellin HEK293-tetON cells and the fluorescence activated cell sorting of theHEK293-tetON cells transfected with integrating mARG constructs of FIG.11 panel A.

FACS results of mARG-expressing HEK293-tetON cells. Cells were binned indifferent relative expression levels, subtypes 1-4 illustrated FIG. 11panel B and C, showed that while all subtypes produced GVs but somesubtypes expressed different amounts of average gas vesicles per cell(FIG. 11 Panel D).

Similar experiments were performed in CHO-tetON which were furthertransfected with the constructs of FIG. 11 panel A to generatepolyclonal cell in the CHO-tetON.

The FACS of mARG-expressing CHO-tetON cells are reported in FIG. 11panel E, representative TEM image of buoyancy-enriched lysate fromCHO-tetON cells sorted are reported in FIG. 11 panel G and theapproximate gas vesicle yield for the sorted mARG-expressing CHO-tetONcells. is reported in FIG. 11 Panel G. Result illustrates thatmARG-expression in different mammalian cells, for example HEK293 andCHO-K1, is possible.

Example 15: Transfection and Expression of an Exemplary GV Gene Clusterwith GVES

Codon-optimized gas vesicle genes from Table 8 were cloned fromdifferent microbial species into unique monocistronic plasmids andmammalian cells were transiently transfected using polyethyleniminenanoparticles (FIG. 12, panel A).

This assay uses the combination of two stochastic events to sample abroad range of gene stoichiometries and expression levels. First, theheterogeneous loading of plasmids in each nanoparticle and second, thevariable delivery of each nanoparticle to the nucleus results in acombinatorial distribution of plasmid copy numbers during eachtransfection experiment.

Upon transfection, the cells were allowed to express the gas vesicleproteins for 72 hours and then gently lysed. The lysate was centrifugedto buoyancy-enrich any fully formed gas vesicles. Finally, the topfraction of the lysate was analyzed under transmission electronmicroscopy for presence and phenotype of gas vesicles.

Transfection of the gas vesicle genes from Halobacteria salinarum andAnabaena flos-aquae did not lead to the formation of detectable gasvesicles in mammalian cells with transmission electron microscopy (seeExample 25), however, the genes from Bacillus megaterium reported inExample 12 were able to produce gas vesicles in mammalian cellsdetectable with the transmission microscopy detection method (FIG. 12,panel B).

The co-transfection of these three plasmids (see Example 12) wassufficient for robust expression of gas vesicles in cells, hereinreferred to as mammalian acoustic reporter gene (mammalian ARG) (FIG.12, panel C).

The first plasmid encodes gas vesicle protein B, the second encodes allassembly factors and the third encodes the proteins requiring a boost inexpression (FIG. 12 Panel D).

Accordingly, a polycistronic plasmid was constructed containing eightgas vesicle genes connected with the porcine teschovirus-1 2Aself-cleavage (p2A) element as schematically shown in FIG. 12, Panel D.

In particular, the schematic illustration of FIG. 12, panel D (middleand bottom) shows an exemplary polycistronic configuration according tothe disclosure.

The construct in the middle of panel D comprises gvpN, F, G, L, S, K, Jand U with two adjacent genes separated by a 2A self-cleaving elementwhich is further exemplified in Example 12 and Table 16 above. Theconstruct at the bottom of panel D comprises gvpJ, F, G, L, and K withtwo adjacent genes separated by a self-cleaving element, exemplified inTable 17.

However, the gene stoichiometry of the one-to-one architecture of theillustration of FIG. 12 panel D (middle, Table 16) was not optimal sincethe co-transfection of this plasmid together with a plasmid that encodedfor gas vesicle protein B and did not lead to detectable gas vesiclesexpression in mammalian cells. By assaying for the relative efficiencyof gas vesicle protein expression from each gene in this plasmid itbecame apparent that three gas vesicle genes (N, S and U) could beexpressed to lower levels compared with gas vesicle genes J, F, G, L andK.

A booster plasmid was therefore provided to further express vesiclegenes J, F, G, L and K which is further described in Example 12 andTable 17 above.

Example 16: Mammalian ARG can be Genomically Integrated

To test the generalizability of the mammalian ARG, the mARG formed inExample 13 was genomically integrated in human embryonic kidney (HEK)cells as well as Chinese hamster ovary cells, allowing them to expressgas vesicles, as exemplified in FIG. 11, using the construct illustratedin FIG. 11 panel A and FIG. 13 panel A.

Mammalian ARGs behaved similarly in both cell lines and usingtransmission electron microscopy. An average yield of one gas vesiclefor every four cells was estimated (FIG. 11, panel D and G for HEK-tetONand CHO-tetON, respectively). This indicated that a subpopulation ofcells was optimally producing gas vesicles. FIG. 13, panel B illustratesa representative image of gas vesicles in the cytosol of HEK cells.

To select for this subpopulation, FIG. 13 panel C and D, the Applicantscreened 30 monoclonal HEK cells and 20% of the cell lines produced onaverage greater than one gas vesicle per cell.

The cell line yielding the highest expression of gas vesicles producedon average 45 gas vesicles per cell (FIG. 13, panel E) when induced with1 μg/mL of doxycycline and 5 mM sodium butyrate for 72 hours, and theApplicant focused on this cell line for the remainder of this work.Importantly, the expression of gas vesicles was not toxic to cells asdetermined using five different assays. These included observing thatthe shape of cells expressing mARGs did not change as a result ofmARG-expression, FIG. 13 panel I, including membrane integrity withtrypan blue, relative number of metabolically active cells withCellTiter-Glo®, and metabolic activity using Resazurin reduction (FIG.13, panel J), as well as including a 6-day co-culture of mARG-HEK cellsshowed only a minor growth disadvantage compared with mCherry-HEK cells(FIG. 13 panel K. In addition, a co-culture of mARG-HEK and HEK293T wascompared with mCherry-HEK and HEK293T cells over 6-days was assayed forfraction of co-culture (FIG. 16). This showed that the expression ofreporter genes (here mARG and mCherry) led to decrease in the fractionof reporter gene-expressing cells relative to HEK293T cells.

Using transmission electron microscopy, as exemplified in FIG. 13 panelG, the average gas vesicles produced in this cell line were measured tobe 64±12 nm (standard deviation) wide and 276±212 nm (standarddeviation) long with some reaching aspect ratios greater than 30(lengths larger than 1 micron) (FIG. 13, panel H). This corresponds toan average gas vesicles volume of 0.605 attoliters (ranging from0.008-10 attoliters), assuming a tapered cylindrical shape as will beunderstood by a skilled person. Representative TEM image of a 60-nmsection through an mARG-HEK cell showing an angled slice through twobundles of gas vesicles in the cytosol in FIG. 13 panel F.

Example 17: Ultrasound Imaging of Mammalian ARG-Expressing Cells

From previous studies, it was anticipated that gas vesicles encoded bythe B. megaterium gene cluster will linearly scatter ultrasound signal(scattering the same ultrasound frequency that was insonated). Due tothe strong linear scattering of ultrasound by mammalian cells this canlead to a challenge for detecting any added echogenicity from theexpressed gas vesicles.

To address this, the Applicant turned to the unique physical property ofgas vesicles in order to extract a unique acoustic signature from theexpressed gas vesicles. In particular Applicant surprisingly found thatacoustic fields with pressures beyond the collapsing threshold of gasvesicles will cause a rapid change in volume, which will transientlydistort the insonated acoustic field (FIG. 14, panel A) (see U.S.application Ser. No. 16/736,581 entitled “BURST UltrasoundReconstruction with Signal Templates and related Methods and Systems”filed on Jan. 7, 2020 herein incorporated by reference in its entirety).

This can be used to sensitively detect gas vesicles-specific nonlinearsignals at the moment of collapse. To image this, serial amplitudemodulation images were acquired during and after the collapse of gasvesicles. This allows for the discrimination of the steady-statebackground signal from the delta function-like signal obtained from thecollapse of gas vesicles (FIG. 14, panels B and C). During the serialacquisition, each amplitude modulation sequence extracts non-linearultrasound echoes by sending two half-amplitude echoes that aredigitally subtracted from a third full amplitude echo. Using thisimaging paradigm, any cytotoxicity from the collapse of gas vesicles wasnot observed (FIG. 14, panel D).

Using this new ultrasound imaging paradigm, the Applicant is interestedin measuring the different characteristics of mammalian ARGs in vitro.To measure the effect of expression length on the ultrasound intensity,cells where allowed to express gas vesicles for the specified number ofdays and 6×10⁶ cells were loaded into acoustically transparent agarosephantoms. After two days, cells expressing gas vesicles produced robustultrasound contrast which increased with respect to expression duration(FIG. 14, panel F). Similar results are obtained by measuringfluorescence from the mCherry reporter expressed by the same cellsexpressed under the same conditions (FIG. 17A).

Example 18: Using Mammalian ARGs to Monitor Circuit-Driven GeneExpression

It is often desirable to obtain a readout of the dynamic cellularfunction of cells the body, for example, to investigate the activationof immune cells at the site of disease or the dynamics of a geneticpathway.

To test if mammalian ARGs can faithfully monitor circuit-driven geneexpression, the Applicant measured the ultrasound response of theexemplary mammalian ARGs of Example 13 under the control of thetetracycline-inducible promoter (using reverse tetracycline-controlledtransactivator). FIG. 14, panel E illustrates mARGs controlled by aconditional promoter (e.g. tetracycline-inducible promoter). Theultrasound contrast produced by cells followed the expected transferfunction of the promoter, as measured by fluorescence (FIG. 17, panelB), confirming the ability of mARGs to follow the dynamics of cellsusing ultrasound (FIG. 14, panel G).

Next, the Applicant sought to identify the sensitivity of detectingmARG-expressing cells in a mixed cell population. For this, controlcells that only expressing mCherry together with gas vesicle-expressingcells were combined at varying ratios. The Applicant was able tosensitively detect cells down a 2.5% of total cells, corresponding to0.5% volumetric densities or approximately 4 cells per voxel (FIG. 14,panel H). This sensitivity is expected to be sufficient for in vivoscenarios.

An alternative method to monitor the dynamics of gene expression or themovement of cells is to erase the signal of a region and monitoring thereturn of that signal. This is a method called acoustic recovery aftercollapse (ARC), analogous to fluorescence recovery after photobleaching(FRAP). In addition, in many imaging experiments, the output of a genecircuit is read out only once. However, in some cases, it may bedesirable to track gene expression over time. To test the abovedescriptions, the Applicant tested whether mARG-expressing cells inwhich the gas vesicles have been collapsed during imaging couldre-express these reporters to allow additional imaging. mARG-HEK cellscultured in a nutrient-supported hydrogel produced clear ultrasoundcontrast 3 days after induction and were able to re-express theiracoustic reporters over three additional days (FIG. 14, panel I and J).

Example 19: Mammalian ARGs Enable Ultrasound Imaging of Gene ExpressionIn Vivo

Having characterized the core capabilities of mammalian ARGs formonitoring cellular location and function in vitro, the Applicant setout in this example to test if this new tool can be used for in vivostudies.

ARG-expressing mammalian cells were introduced subcutaneously in theleft flank of mice while loading mCherry-only control cells in theirright flank (FIG. 15, panel A). The reporter genes in both cells wereunder the control of the tetracycline-inducible protomer, as a resultthe mice were intraperitoneally injected with 75 μg doxycycline and 25mg sodium butyrate on a daily basis (FIG. 15 panel B). After the cellswere allowed to express their respective reporter genes, fluorescenceand ultrasound contrast of the cells was collected. The Applicant wasable to for the first time monitor gene expression in vivo with greatspatial resolution using BURST ultrasound (FIG. 15, panel C and FIG.22A, panel A). Ultrasound imaging of control tumor expressing mCherrydid not produce BURST ultrasound signal (FIG. 15, panel D and FIG. 22A).

Interestingly, fluorescence imaging indicated that both tumors werereceiving the inducer doxycycline (FIG. 15, panel G) but it appears asthough the entire tumor was equally expressing the reporter genes.However, using ultrasound only a ‘zone’ of gas vesicles-specificcontrast was observed. Using doppler ultrasound, a technique used tovisualize the vasculature, it was observed that inside the tumor wasavascular as expected from the short period post inoculation (FIG. 20).After the tumors were sectioned and imaged using fluorescent histology(FIG. 15, panel F and FIG. 21), it became evident that the diffusion ofinducer to the tumor cells painted a band of gene expression. Thispattern of gene expression was non-invasively visualized with ultrasoundusing mARGs, whereas fluorescent imaging could not reveal thisexpression pattern due to the limited penetration of light in tissue.BURST ultrasound imaging of adjacent planes could be collected tonon-invasively image gene expression across the tumor (FIG. 15, panel Eand FIG. 19). Furthermore, similar to the in vitro experiments,mARG-expressing cells can repeatedly express gas vesicles, imaged andre-express gas vesicles to enable repeated monitoring cellular locationand function (FIG. 22B).

Example 20: Ultrasound Contrast in View of GV Concentration in MammalianCells In Vitro

A further set of experiments was performed to test the dependence ofultrasound contrast on gas vesicle density in mammalian cell culture. Inparticular, a monoculture of mARG-HEK cells was induced with differentconcentrations of doxycycline, or after fully-induced mARG-HEK cellswere mixed with mCherry-HEK cells at different ratios. All cells werecultured with 5 mM sodium butyrate during expression. After thatrelative ultrasound contrast produced by mARG-HEK cells was tested inhydrogel as a function of the estimated average number of gas vesicles(GV) per nanoliter present. The number of gas vesicles was quantifiedafter 72 hours of induced expression, as counted in lysates using TEM.Ultrasound contrast was normalized to the maximum in each type oftitration.

In particular the ultrasound contrast mARG-HEK cells induced with 1μg/mL doxycycline for 3 days (producing on average 45 gas vesicles percell) mixed with mCherry-HEK cells (expressing no gas vesicles) invarying proportions is reported in FIG. 18 with light gray symbols.

The ultrasound contrast of mARG-HEK cells induced with 0.01, 0.05, 0.1and 1 μg/mL doxycycline for 3 days; expressing on average 0.01±0.004,1.4±0.4, 3.5±0.3, 45±5.1 (mean±SEM) gas vesicles per cell, respectively,as quantified by TEM is reported in FIG. 18 with dark gray symbols.

From this study, illustrated in FIG. 18, the applicants can conclude todetect the presence of mARG-expressing cells in these mixtures down to2.5% of total cells, corresponding to <0.5% volumetric density, or aboutthree cells or 135 gas vesicles per voxel with dimensions of 100 mm. Asimilar voxel-averaged concentration of gas vesicles was detectable in amonoculture of mARG-HEK cells induced to express 1.4±0.6 gas vesiclesper cell.

Example 21: Selection Funnel for GVES Transfected n Mammalian Cells InVitro

GVES can be integrated in the genome of mammalian cells, e.g. Example13. Genomic integration methods described above and known by a skilledperson will produce a heterogeneous polyclonal population of cells. Inthis heterogeneous population of cells, there will be a range of GVESexpression levels from high expression down to no detectable expression.

The polyclonal population of mammalian cells will produce gas vesiclesas illustrated in FIG. 11. Using cell sorting methods such as FACSand/or magnetic assisted cell sorting (MACS), the cells can be binnedinto groups of cells with similar expression profiles, as exemplified in(FIG. 11, panel B) or monoclonal cells can be selected (FIG. 12, panel Cand D). Monoclonal cells are a colony of cells that have been expandedfrom a single parent cell.

The applicant selected 575 monoclonal cells using FACS from polyclonalHEK-tetON cells that using the piggyBac transposase system, had Example13 GVES integrated in their genome. From these cells, the bestperforming monoclonal cells were assayed by measuring cellularviability, fluorescence intensity, and gas vesicle expression asmeasured by TEM for each cell after expression for 72 hours (uponinduction with 1 μg/mL of doxycycline and 5 mM sodium butyrate (Table19).

TABLE 19 selection funnel for mARG-HEK cells Collected Formed Triplepositive Formed from FACS colonies fluorescence GVs (TEM) >1 GVs/cell576 30 21 12 6

The numbers indicate the number of cells or cell lines selected at eachstage. From this experiment, the best performing cells produced onaverage 45 gas vesicles per cell (FIG. 13, panel E).

Example 22: Exemplary GVGC Polynucleotide Construct to Allow Expressionof Two Different GV Cassettes

Experiments were performed to identify elements that can be used tocreate configurations of a construct designed to allow expression of twodifferent GV cassettes.

An element that can be used in constructs of the present disclosure isexemplified in the exemplary construct in FIG. 23 designed to providealternating expression of two GV types in a prokaryotic cell and/ormammalian cell, the first GV type encoded by Cluster 1, and the secondGV type encoded by Cluster 2, shown as block-shaped arrows facing inopposite orientations of a DNA strand (shown as a straight line), with apromoter between the two clusters. The promoter is flanked byrecombination sites (e.g. flippase recognition target, FRT sites) shownas circles. For example, initially, the promoter can be oriented in adirection operatively linked to Cluster 1, initiating expression of gvpgenes for the formation of GV type 1.

In presence of a cognate recombinase (e.g. flippase, Flp, CRE/Lox),expressed from another genetic construct in the mammalian cell, theorientation of the promoter is reversed upon recombination at the FRTsites, and thereafter is oriented in the opposite direction, operativelylinked to Cluster 2, initiating expression of gvp genes for theformation of GV type 2.

The use of recombination sites can alternatively control the conditionalexpression of a transactivating or repressing protein element thatcontrol the activity of GVES promoter(s). The recombination site,flanking a promoter that controls the expression of the transcriptionregulatory factor (e.g. TET) can be switched in an orientation that canexpress the transactivating or repressing protein element, or can switchto the opposite direction so that transcription regulatory factor is nolonger expressed. As a result, the activity of the GVES promoter can betuned.

Example 23: Construction of Consolidated Optimized GVES System

Experiments were performed to verify whether the architecture of themARG of Example 13 can be further consolidated by connecting the gasvesicle protein B gene to the polycistronic construct using IRES. Whenthis architecture is co-transfected to cells with the booster plasmid,it robustly produces gas vesicles. This strategy is being furtherpursued to consolidate the mammalian ARG to a single genetic cassette.

In particular, a consolidated mARG construct comprising 2 gene cassettesenabling mammalian gas vesicle expression has been identified followingthe Experiments reporting in FIGS. 24A-24D.

The construct encoding gvpB from B. megaterium of Table 15 was combinedwith the construct in Table 16 using an IRES sequence. A schematicillustrates this in FIG. 24A (top) and Table 20 indicates the genesequence.

The cassette from Table 20 and table 17 were integrated to the genome ofHEK293-tetON cells as reported the material and methods. GV expressionin these cells was detectable using TEM of the cell lysate after 72hours of expression with 1 μg/mL doxycycline (FIG. 24B).

Similarly, the construct encoding gvpB from B. megaterium of Table 11was combined with the construct in Table 14 using an IRES sequence. Aschematic illustrates this in FIG. 24C (top) and Table 21 indicates thegene sequence. The cassette from Table 21 and table 12 were transientlytransfected to the genome of HEK293T cells as reported the material andmethods. GV expression in these cells was detectable using BURSTultrasound imaging of the cells after 72 hours of expression (FIG. 24D),HEK control refers to wild types HEK293T cells and BiresJFGLK NU refersto HEK293T cells that have been transfected with constructs in Table 21and Table 12.

TABLE 20Exemplary consolidated polynucleotide cassette for polycistronic expressionof gvpB with GVA proteins. seq id Construct Sequence no: CMV: gvpB:CGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACC 454 IRES:CCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATA gvpNFGESGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCA KJU-CTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACG EmGFP:TCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTA polyATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATAGAAGACACCGGGACCGATCCAGCCTCCGGACTCTAGCCTAGGCTTTTGCAAAAAGCTATTTAGGTGACACTATAGAAGGTACGCCTGCAGGTACCGAGCTCGGATCCAGTACCCTTCACCATGAGCATCCAGAAGTCCACCAACAGCAGCAGCCTGGCCGAAGTGATCGACCGGATCCTGGACAAGGGCATCGTGATCGACGCCTTCGCCAGAGTGTCCGTCGTGGGCATCGAGATCCTGACCATCGAGGCCAGAGTCGTGATCGCCAGCGTGGACACCTGGCTGAGATATGCCGAAGCCGTGGGCCTGCTGCGGGACGACGTGGAAGAAAATGGCCTGCCCGAGCGGAGCAACAGCTCTGAGGGACAGCCCCGGTTCAGCATCTGAACTAAATCGCACTGTCGGCGTCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACACGATGATAATATGGCCACAACCGTGACCGTGCTGACCGACAAGCGGAAGAAGGGCAGCGGCGCCTTCATCCAGGACGACGAGACAAAAGAGGTGCTGAGCAGAGCCCTGAGCTACCTGAAGTCCGGCTACAGCATCCACTTCACCGGACCTGCCGGCGGAGGCAAGACATCTCTGGCTAGAGCCCTGGCCAAGAAACGGAAGCGGCCCGTGATGCTGATGCACGGCAACCACGAGCTGAACAACAAGGACCTGATCGGCGATTTCACCGGCTACACCAGCAAAAAGGTGATCGACCAGTACGTGCGGAGCGTGTACAAGAAAGACGAACAGGTGTCCGAGAACTGGCAGGACGGCAGACTGCTGGAAGCCGTGAAGAATGGCTACACCCTGATCTACGACGAGTTCACCAGAAGCAAGCCCGCTACCAACAACATCTTCCTGAGCATCCTTGAGGAGGGCGTGCTGCCCCTGTACGGCGTGAAGATGACCGACCCTTTCGTGCGCGTGCACCCCGACTTCAGAGTGATCTTTACCAGCAACCCCGCCGAGTATGCCGGCGTGTACGATACCCAGGACGCCCTGCTGGACCGGCTGATCACCATGTTCATCGACTACAAGGACATCGACCGGGAAACCGCTATCCTGACCGAGAAAACTGACGTGGAAGAAGACGAGGCCCGGACCATCGTGACCCTGGTGGCCAACGTGCGGAACAGAAGCGGCGACGAGAATAGCAGCGGCCTGAGCCTGAGAGCCAGCCTGATGATTGCCACCCTGGCCACCCAGCAGGACATCCCTATCGATGGCAGCGACGAGGACTTCCAGACCCTGTGCATCGACATCCTGCACCACCCCCTGACCAAGTGCCTGGACGAAGAGAACGCCAAGAGCAAGGCCGAGAAGATCATTCTCGAAGAGTGCAAGAACATCGACACCGAGGAGAAGGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGAGCGAGACAAACGAGACAGGCATCTACATCTTCAGCGCCATCCAGACAGACAAGGATGAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCTGAGACATTCCTGATCCGGTATAAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAGAACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATCCCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTCCTGGAAAACCTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTGAAGGTGATCGGCAAGAAAGAGTGGCTCGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAAAAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAGCTGGGCGGCATGGCCCAGAAGATGTTCACAAGCCTGCAGAAAGAAGTGAAAACCGACGTGTTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAAACAATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGATGAGGCCAAGTTCGACGAGAAAGTCAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGGCCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCTGCACAAGCTCGTGACCGCCCCCATCAACCTGGTCGTGAAGATCGGCGAGAAGGTGCAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAGCTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGACGAACTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATTGAGCAGTGGGAAGAACTGACCCAGAAGCGGAATGAGGAAAGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGGCGAGCTGCTGTACCTCTACGGCCTGATCCCCACCAAAGAGGCCGCTGCTATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGCGAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGACGCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTGCAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTATGAGGAATTCACCATCATCCCCCTGAAGTTCTGCACCATCTATAAGGGAGAGGAATCCCTGCAGGCCGCCATCGAGATCAACAAAGAGAAGATCGAAAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAACGTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGCGTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAGAAGAAGATTGACCAGCTCATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAGGAAATCCACGACAAGCTGATTGAGCTGAGCCTCTACGACTCCGTGAAGAAGAACTGGTCCAAGGACGTGACAGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGCCTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTCCAGCAGCGGCTGGAGAACAAGGGATGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGAGCCTGAAGCAGAGCATGGAGAATAAGGATATCGCCCTGATCGACATCCTCGACGTGATCCTGGACAAGGGAGTGGCCATCAAGGGCGACCTGATCATCTCTATCGCCGGCGTGGACCTGGTGTACCTGGATCTGAGAGTGCTGATCTCCAGCGTGGAAACCCTGGTGCAGGCCAAAGAGGGCAACCACAAGCCCATCACCAGCGAGCAGTTCGACAAGCAGAAAGAGGAGCTGATGGACGCCACCGGCCAGCCCAGCAAGTGGACAAATCCTCTGGGCAGCGGCGCTCCCGGGTCAGGTGCCACGAATTTTTCGTTGTTGAAGCAAGCTGGGGATGTTGAAGAGAACCCAGGGCCTGTGCAGCCCGTGTCCCAGGCCAACGGCAGAATCCACCTGGATCCCGATCAGGCCGAACAGGGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCACGCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATCGCTCTGATGAACCTGGAGGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCTGAGGATCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGCCGTGGAACACAACATGCAGAGCAGCACCATCGTGGACGTGCTGGAAAAGATCCTGGACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTGACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGATTGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCTCTGGAAGAGGAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGAGCACCGGCCCCAGCTTCAGCACCAAGGACAACACCCTGGAATACTTCGTGAAGGCCAGCAACAAGCACGGCTTTAGCCTCGACATCAGCCTGAACGTGAATGGGGCCGTGATTAGCGGCACCATGATCAGCGCCAAAGAGTACTTCGACTACCTGAGCGAGACATTCGAAGAGGGCAGCGAAGTGGCCCAGGCCCTGTCTGAGCAGTTTAGCCTGGCTAGCGAGGCCTCCGAGTCTAATGGCGAAGCCGAGGCCCACTTCATCCACCTGAAGAACACCAAGATCTACTGCGGCGACAGCAAGAGCACCCCCAGCAAGGGCAAGATCTTCTGGCGCGGCAAGATCGCCGAGGTGGACGGATTCTTCCTGGGAAAAATCAGCGACGCCAAGTCCACCAGCAAGAAGTCCAGCGGCGCTCCCGGGTCAGGTGCCACGAATTTTTCGTTGTTGAAGCAAGCTGGGGATGTTGAAGAGAACCCAGGGCCTGTGGTGTCCAAGGGCGAGGAACTGTTCACCGGCGTGGTGCCCATCCTGGTGGAACTGGATGGCGACGTGAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAAGGCGACGCCACATACGGAAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCTTGGCCTACCCTCGTGACCACACTGACCTACGGCGTGCAGTGCTTCGCCAGATACCCCGACCACATGAAGCAGCACGATTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAACGGACCATCTTCTTCAAGGACGACGGCAACTACAAGACAAGAGCCGAAGTGAAGTTCGAGGGCGACACCCTCGTGAACCGGATCGAGCTGAAGGGCATCGACTTCAAAGAGGATGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGCCACAAGGTGTACATCACCGCCGACAAGCAGAAAAACGGCATCAAAGTGAACTTCAAGACCCGGCACAACATCGAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGAGATGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACACAAAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGGGACCACATGGTGCTGCTGGAATTTGTGACCGCCGCTGGCATCACCCTGGGCATGGACGAGCTGTACAAGTGACTCGAGTCTAGAGGGCCCCGTGGCTGTAATCTAGAGGATCCCTCGAGGGGCCCAAGCTTACGCGTGCATGCGACGTCATAGCTCTCTCCCTATAGTGAGTCGTATTATAAGCTAGCTTGGGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGACAAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAAGTGTATAATGTGTTAAACTAGCTGCATATGCTTGCTGCTTGAGAGTTTTGCTTACTGAGTATGATTTATGAAAATATTATACACAGGAGCTAGTGATTCTAATTGTTTGTGTATTTTAGATTCACAGTCCCAAGGCTCATTTCAGGCCCCTCAGTCCTCACAGTCTGTTCATGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGGATC

TABLE 21Alternative exemplary consolidated polynucleotide cassette for polycistronicexpression of gvpB with GVA proteins. seq id Construct Sequence no:CMV: gvpB: CGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACC 455 IRES:CCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATA gvpJFGLK:GGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCA polyACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATAGAAGACACCGGGACCGATCCAGCCTCCGGACTCTAGCCTAGGCTTTTGCAAAAAGCTATTTAGGTGACACTATAGAAGGTACGCCTGCAGGTACCGAGCTCGGATCCAGTACCCTTCACCATGAGCATCCAGAAGTCCACCAACAGCAGCAGCCTGGCCGAAGTGATCGACCGGATCCTGGACAAGGGCATCGTGATCGACGCCTTCGCCAGAGTGTCCGTCGTGGGCATCGAGATCCTGACCATCGAGGCCAGAGTCGTGATCGCCAGCGTGGACACCTGGCTGAGATATGCCGAAGCCGTGGGCCTGCTGCGGGACGACGTGGAAGAAAATGGCCTGCCCGAGCGGAGCAACAGCTCTGAGGGACAGCCCCGGTTCAGCATCTGAACTAAATCGCACTGTCGGCGTCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACACGATGATAATATGGCCACAACCGTGGCCGTGGAACACAACATGCAGAGCAGCACCATCGTGGACGTGCTGGAAAAGATCCTGGACAAGGGCGTCGTGATCGCCGGGGACATCACAGTGGGAATCGCCGACGTGGAACTGCTGACCATCAAGATCCGGCTGATCGTGGCCAGCGTGGACAAGGCCAAAGAAATCGGCATGGATTGGTGGGAGAACGACCCCTACCTGAGCAGCAAGGGCGCCAACAACAAGGCCCTGGAAGAGGAAAACAAGATGCTGCACGAGCGGCTGAAAACACTGGAAGAGAAGATCGAGACAAAGCGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGAGCGAGACAAACGAGACAGGCATCTACATCTTCAGCGCCATCCAGACAGACAAGGATGAGGAATTCGGCGCCGTGGAAGTGGAAGGGACCAAGGCTGAGACATTCCTGATCCGGTATAAGGACGCCGCCATGGTGGCCGCCGAAGTGCCCATGAAGATCTACCACCCCAACCGGCAGAACCTGCTGATGCACCAGAATGCCGTGGCCGCCATCATGGACAAGAACGACACCGTGATCCCCATCAGCTTCGGCAACGTGTTCAAGAGCAAAGAGGACGTGAAGGTGCTCCTGGAAAACCTGTACCCCCAGTTCGAGAAGCTGTTCCCCGCCATCAAGGGAAAGATCGAAGTGGGCCTGAAGGTGATCGGCAAGAAAGAGTGGCTCGAAAAGAAAGTGAACGAGAACCCCGAGCTGGAAAAAGTGTCCGCCAGCGTGAAGGGCAAGAGCGAGGCCGCTGGCTACTACGAGAGAATCCAGCTGGGCGGCATGGCCCAGAAGATGTTCACAAGCCTGCAGAAAGAAGTGAAAACCGACGTGTTCAGCCCCCTGGAAGAAGCCGCCGAGGCCGCCAAAGCCAATGAGCCTACAGGCGAAACAATGCTGCTGAACGCCAGCTTCCTGATCAACAGAGAGGATGAGGCCAAGTTCGACGAGAAAGTCAATGAGGCCCACGAGAACTGGAAGGATAAGGCCGACTTCCACTACAGCGGCCCCTGGCCCGCCTACAACTTCGTGAACATCCGGCTGAAGGTGGAAGAGAAGGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCTGCACAAGCTCGTGACCGCCCCCATCAACCTGGTCGTGAAGATCGGCGAGAAGGTGCAGGAAGAGGCCGACAAGCAGCTGTACGACCTGCCCACCATCCAGCAGAAGCTGATCCAGCTGCAGATGATGTTCGAGCTGGGCGAGATCCCCGAGGAAGCCTTCCAGGAAAAAGAGGACGAACTGCTGATGAGATACGAGATCGCCAAGCGGCGCGAGATTGAGCAGTGGGAAGAACTGACCCAGAAGCGGAATGAGGAAAGCGGTGCCCCGGGATCTGGCGCAACAAATTTTAGTCTTTTAAAGCAGGCAGGAGACGTCGAGGAAAACCCTGGACCCGTGGGCGAGCTGCTGTACCTCTACGGCCTGATCCCCACCAAAGAGGCCGCTGCTATCGAGCCCTTCCCATTCTACAAGGGCTTCGACGGCGAGCACAGCCTGTACCCTATCGCCTTCGACCAAGTGACCGCCGTGGTGTTCAAGCTGGACGCCGACACCTACAGCGAGAAAGTGATCCAGGAAAAGATGGAACAGGACATGAGCTGGCTGCAGGAAAAGGCCTTCCACCACCACGAGACAGTGGCCGCCCTGTATGAGGAATTCACCATCATCCCCCTGAAGTTCTGCACCATCTATAAGGGAGAGGAATCCCTGCAGGCCGCCATCGAGATCAACAAAGAGAAGATCGAAAACTCCCTGACCCTGCTGCAGGGCAACGAGGAATGGAACGTGAAGATCTACTGCGACGACACCGAGCTGAAGAAGGGCATCAGCGAGACAAACGAGAGCGTGAAGGCCAAGAAGCAGGAAATCAGCCACCTGAGCCCCGGCAGACAGTTCTTCGAGAAGAAGAAGATTGACCAGCTCATCGAGAAAGAGCTGGAACTGCACAAGAACAAAGTGTGCGAGGAAATCCACGACAAGCTGATTGAGCTGAGCCTCTACGACTCCGTGAAGAAGAACTGGTCCAAGGACGTGACAGGCGCTGCCGAACAGATGGCCTGGAACAGCGTGTTCCTGCTGCCCAGCCTGCAGATCACCAAGTTCGTGAACGAGATCGAGGAACTCCAGCAGCGGCTGGAGAACAAGGGATGGAAGTTCGAAGTGACCGGCCCCTGGCCTCCCTACCACTTCAGCAGCTTTGCCGGGGCACCTGGCTCGGGAGCGACCAACTTCTCATTACTCAAACAAGCCGGAGACGTTGAGGAGAATCCAGGCCCTGTGCAGCCCGTGTCCCAGGCCAACGGCAGAATCCACCTGGATCCCGATCAGGCCGAACAGGGACTGGCCCAGCTCGTGATGACCGTGATCGAGCTGCTGCGGCAGATCGTGGAACGGCACGCCATGAGAAGAGTGGAAGGCGGCACCCTGACCGACGAGCAGATCGAGAATCTGGGAATCGCCCTGATGAACCTGGAAGAGAAGATGGACGAGCTGAAAGAGGTGTTCGGACTGGACGCCGAGGACCTGAACATCGACCTGGGCCCTCTGGGCAGCCTGCTGTGATCGAGTCTAGAGGGCCCCGTGGCTGTAATCTAGAGGATCCCTCGAGGGGCCCAAGCTTACGCGTGCATGCGACGTCATAGCTCTCTCCCTATAGTGAGTCGTATTATAAGCTAGCTTGGGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGACAAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAAGTGTATAATGTGTTAAACTAGCTGCATATGCTTGCTGCTTGAGAGTTTTGCTTACTGAGTATGATTTATGAAAATATTATACACAGGAGCTAGTGATTCTAATTGTTTGTGTATTTTAGATTCACAGTCCCAAGGCTCATTTCAGGCCCCTCAGTCCTCACAGTCTGTTCATGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATC TTATCATGTCTGGATC

Example 24—Hybrid GVES Constructs can Produce Gas Vesicles in MammalianCells

Gvps from different organisms have been combined together to producehybrid gas vesicles reporting constructs.

The applicants have combined Ana-gvpA, Table 10, with polynucleotideplasmids from B. megaterium of Table 13 and Table 14 to make a hybridGV. The GVAs are from Anabaena flos-aquae and the GVS are from B.megaterium. HEK293T cells expressing constructs Ana-gvpA from Table 10,and constructs from Table 13 and Table 14 were able to produce gasvesicles as detectable by BURST ultrasound imaging (FIG. 25A). Theskilled person will recognize that a hybrid construct with the abovegene cassettes in addition to Ana-gvpC will also produce gas vesicles inmammalian cells as detectable by the methods described in thisapplication.

Similarly, the applicants have combined Ana-gvpA, Ana-gvpC, Ana-gvpNfrom Table 10, together with B. megaterium GVS genes from Table 8.HEK293T cells expressing these hybrid genes were able to produce gasvesicles as detectable by BURST ultrasound imaging (FIG. 25B).

Example 25: GVES Constructs Using Anabaena flos-Aquae Genes can ProduceGas Vesicles in Mammalian Cells

Using gvps from Table 10, the applicants have expressed gas vesicles asdetectable by TEM and ultrasound imaging in mammalian cells (e.g.HEK293T). HEK293T cells were transfected with the following constructsand were detectable by both TEM imaging (FIGS. 26A-D).

The applicants have transfected HEK293T cells using gvps originatingfrom Anabaena flos-aquae as catalogues in the NCBI database, with allgenes have the same sequences as shown in Table 10 except for gvpG,which has the following sequence(MGSLTKLLLLPIMGPLNGVVWIAEQIQERTNTEFDAQENLHKQLLSLQLSFDIGEIGEEEFEIQEEEILLKIQALEEEARLELEAEQEEARLELEAEQEDFEYHLNSQQKLIKINISSCYLSI DGRK,SEQ ID NO: 456). Gas vesicles from this construct produces gas vesiclesas detectable by BURST ultrasound imaging (FIG. 26E) but not TEM, sinceBURT ultrasound imaging is a more sensitive technique at detecting gasvesicle expression compared with TEM. The applicants sequenced gvpG genefrom native Anabaena flos-aquae cells that natively express gas vesiclesand found the gvpG sequence in table 10. HEK293T cells transfected withconstructs from Table 10 produce gas vesicles as detectable by a higherBURST ultrasound signal (FIG. 26E) and TEM (FIG. 26A).

Gas vesicles with the structural properties of Anabaena flos-aquae genescan be tuned to have different non-linear properties using thestructural protein gvpC [44] [45]. The applicants have demonstrated thatHEK293T cells expressing the Ana genetic construct in FIG. 26A producesBURST ultrasound image (FIG. 27, panel A, left) but do not producenonlinear ultrasound images using amplitude modulation (AM) ultrasoundmethod (FIG. 27, panel B, left). However, HEK293T cells expressing Anagenetic construct in FIG. 26B are able to produce both BURST ultrasoundimage (FIG. 27, panel A, right) and nonlinear ultrasound images using AMultrasound method (FIG. 27, panel B, right).

These different variants can be used for multiplexed imaging as theirsignature ultrasound properties can be distinguished. Importantly, GVconstructs that can produce nonlinear ultrasound signal as detectable byamplitude modulation, pulse inversion, amplitude modulation pulseinversion, and other nonlinear ultrasound imaging methods known to theskilled person will be useful for detecting and imaging gas vesicles incomplex biological environments (for example imaging inside the animal).

In summary, provided herein are genetically engineered gas vesicleexpression systems (GVES) that are configured to express gas vesicles(GVs) in a mammalian cell, related gas vesicle polynucleotideconstructs, gas vesicle reporting genetic circuits, vectors, geneticallyengineered mammalian cells, non-human mammalian hosts, compositions,methods and systems, which in several embodiments can be used togetherwith contrast-enhanced imaging techniques to detect and reportbiological events in an imaging target site comprising a mammalian celland/or organism.

The examples set forth above are provided to give those of ordinaryskill in the art a complete disclosure and description of how to makeand use the embodiments of the of the GVES system, polynucleotideconstructs for expression of a gas vesicle in mammalian cells, andrelated GVR genetic circuits, vectors, genetically engineered mammaliancells, compositions, methods and systems of the disclosure, and are notintended to limit the scope of what the inventors regard as theirdisclosure. Those skilled in the art will recognize how to adapt thefeatures of the exemplified polynucleotide GV constructs, and relatedgenetic circuits, vectors, genetically engineered prokaryotic cells,compositions, methods and systems herein disclosed to additionalpolynucleotide GV constructs, and related genetic circuits, vectors,genetically engineered mammalian cells, compositions, methods andsystems according to various embodiments and scope of the claims.

All patents and publications mentioned in the specification areindicative of the levels of skill of those skilled in the art to whichthe disclosure pertains.

The entire disclosure of each document cited (including patents, patentapplications, journal articles, abstracts, laboratory manuals, books, orother disclosures) in the Background, Summary, Detailed Description, andExamples is hereby incorporated herein by reference. All referencescited in this disclosure are incorporated by reference to the sameextent as if each reference had been incorporated by reference in itsentirety individually. However, if any inconsistency arises between acited reference and the present disclosure, the present disclosure takesprecedence. Further, the computer readable form of the sequence listingof the ASCII text file P2420-US-2020-05-05-Sequence-Listing-ST25 isincorporated herein by reference in its entirety.

The terms and expressions which have been employed herein are used asterms of description and not of limitation, and there is no intention inthe use of such terms and expressions of excluding any equivalents ofthe features shown and described or portions thereof, but it isrecognized that various modifications are possible within the scope ofthe disclosure claimed. Thus, it should be understood that although thedisclosure has been specifically disclosed by embodiments, exemplaryembodiments and optional features, modification and variation of theconcepts herein disclosed can be resorted to by those skilled in theart, and that such modifications and variations are considered to bewithin the scope of this disclosure as defined by the appended claims.

It is also to be understood that the terminology used herein is for thepurpose of describing particular embodiments only, and is not intendedto be limiting. As used in this specification and the appended claims,the singular forms “a,” “an,” and “the” include plural referents unlessthe content clearly dictates otherwise. The term “plurality” includestwo or more referents unless the content clearly dictates otherwise.Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which the disclosure pertains.

When a Markush group or other grouping is used herein, all individualmembers of the group and all combinations and possible sub-combinationsof the group are intended to be individually included in the disclosure.Every combination of components or materials described or exemplifiedherein can be used to practice the disclosure, unless otherwise stated.One of ordinary skill in the art will appreciate that methods, systemelements, and materials other than those specifically exemplified may beemployed in the practice of the disclosure without resort to undueexperimentation. All art-known functional equivalents, of any suchmethods, device elements, and materials are intended to be included inthis disclosure. Whenever a range is given in the specification, forexample, a temperature range, a frequency range, a time range, or acomposition range, all intermediate ranges and all subranges, as wellas, all individual values included in the ranges given are intended tobe included in the disclosure. Any one or more individual members of arange or group disclosed herein may be excluded from a claim of thisdisclosure. The disclosure illustratively described herein suitably maybe practiced in the absence of any element or elements, limitation orlimitations which is not specifically disclosed herein.

A number of embodiments of the disclosure have been described. Thespecific embodiments provided herein are examples of useful embodimentsof the disclosure and it will be apparent to one skilled in the art thatthe disclosure can be carried out using a large number of variations ofthe genetic circuits, genetic molecular components, and methods stepsset forth in the present description. As will be obvious to one of skillin the art, methods and systems useful for the present methods andsystems may include a large number of optional composition andprocessing elements and steps.

In particular, it will be understood that various modifications may bemade without departing from the spirit and scope of the presentdisclosure. Accordingly, other embodiments are within the scope of thefollowing claims.

REFERENCES

-   1. Tashiro, Y., et al., Molecular genetic and physical analysis of    gas vesicles in buoyant enterobacteria. Environmental    microbiology, 2016. 18(4): p. 1264-1276.-   2. Van Keulen, G., et al., Gas vesicles in actinomycetes: old buoys    in novel habitats?Trends in microbiology, 2005. 13(8): p. 350-354.-   3. Walsby, A. E., Gas vesicles. Microbiol. Rev., 1994. 58(1): p.    94-144.-   4. Walsby, A. E., Gas-vacuolate bacteria (apart from cyanobacteria),    in The Prokaryotes. 1981, Springer. p. 441-447.-   5. Walsby, A. E., Cyanobacteria: planktonic gas-vacuolate forms. The    Prokaryotes, a Handbook on Habitats, Isolation, and Identification    of Bacteria, 2013. 1: p. 224-235.-   6. Woese, C. R., Bacterial evolution. Microbiological reviews, 1987.    51(2): p. 221.-   7. Walsby, A. E., Gas vesicles. Microbiol Rev, 1994. 58(1): p.    94-144.-   8. Pfeifer, F., Distribution, formation and regulation of gas    vesicles. Nat. Rev. Microbiol., 2012. 10(10): p. 705-15.-   9. Yi, G., S.-H. Sze, and M. R. Thon, Identifying clusters of    functionally related genes in genomes. Bioinformatics, 2007.    23(9): p. 1053-1060.-   10. Bourdeau, R. W., et al., Acoustic reporter genes for noninvasive    imaging of microorganisms in mammalian hosts. Nature, 2018.    553(7686): p. 86-90.-   11. Lakshmanan, A., et al., Preparation of biogenic gas vesicle    nanostructures for use as contrast agents for ultrasound and MRI.    Nat Protoc, 2017. 12(10): p. 2050-2080.-   12. Hayes, P. and R. Powell, The gvpA/C cluster of Anabaena    flos-aquae has multiple copies of a gene encoding GvpA. Archives of    microbiology, 1995. 164(1): p. 50-57.-   13. Kinsman, R. and P. Hayes, Genes encoding proteins homologous to    halobacterial Gvps N, J, K, F & L are located downstream of gvpC in    the cyanobacterium Anabaena flos-aquae. DNA Sequence, 1997. 7(2): p.    97-106.-   14. Myers, E. W. and W. Miller, Optimal alignments in linear space.    Computer applications in the biosciences: CABIOS, 1988. 4(1): p.    11-17.-   15. Smith, T. F. and M. S. Waterman, Comparison of biosequences.    Advances in applied mathematics, 1981. 2(4): p. 482-489.-   16. Needleman, S. B. and C. D. Wunsch, A general method applicable    to the search for similarities in the amino acid sequence of two    proteins. Journal of molecular biology, 1970. 48(3): p. 443-453.-   17. Pearson, W. R. and D. J. Lipman, Improved tools for biological    sequence comparison. Proceedings of the National Academy of    Sciences, 1988. 85(8): p. 2444-2448.-   18. Karlin, S. and S. F. Altschul, Methods for assessing the    statistical significance of molecular sequence features by using    general scoring schemes. Proceedings of the National Academy of    Sciences, 1990. 87(6): p. 2264-2268.-   19. Karlin, S. and S. F. Altschul, Applications and statistics for    multiple high-scoring segments in molecular sequences. Proceedings    of the National Academy of Sciences, 1993. 90(12): p. 5873-5877.-   20. Lu, G. J., et al., Acoustically modulated magnetic resonance    imaging of gas-filled protein nanostructures. Nat Mater, 2018.    17(5): p. 456-463.-   21. Pfeifer, F., Distribution, formation and regulation of gas    vesicles. Nat Rev Microbiol, 2012. 10(10): p. 705-15.-   22. Li, N. and M. C. Cannon, Gas vesicle genes identified in    Bacillus megaterium and functional expression in Escherichia coli. J    Bacteriol, 1998. 180(9): p. 2450-8.-   23. Tashiro, Y., et al., Molecular genetic and physical analysis of    gas vesicles in buoyant enterobacteria. Environ Microbiol, 2016.    18(4): p. 1264-76.-   24. Ramsay, J. P., et al., A quorum-sensing molecule acts as a    morphogen controlling gas vesicle organelle biogenesis and adaptive    flotation in an enterobacterium. Proc Natl Acad Sci USA, 2011.    108(36): p. 14932-7.-   25. Schechter, I. and A. Berger, On the size of the active site in    proteases. I. Papain. Biochem Biophys Res Commun., 1967. 27(2): p.    157-162.-   26. Schechter, I. and A. Berger, On the active site of proteases. 3.    Mapping the active site of papain; specific peptide inhibitors of    papain. Biochem Biophys Res Commun., 1968 32(5): p. 898-902.-   27. Calvo, S. E., D. J. Pagliarini, and V. K. Mootha, Upstream open    reading frames cause widespread reduction of protein expression and    are polymorphic among humans. Proc Natl Acad Sci USA, 2009.    106(18): p. 7507-12.-   28. Rose, A. B., Intron-mediated regulation of gene expression. Curr    Top Microbiol Immunol, 2008. 326: p. 277-90.-   29. Reddy A. S. N., G. M., Nuclear pre-mRNA Processing in Plants.    Current Topics in Microbiology and Immunology. 326: p. 14.-   30. Purnick, P. E. and R. Weiss, The second wave of synthetic    biology: from modules to systems. Nat Rev Mol Cell Biol, 2009.    10(6): p. 410-22.-   31. Buchler, N. E., U. Gerland, and T. Hwa, On schemes of    combinatorial transcription logic. Proceedings of the National    Academy of Sciences, 2003. 100(9): p. 5136-5141.-   32. Silva-Rocha, R. and V. de Lorenzo, Mining logic gates in    prokaryotic transcriptional regulation networks. FEBS letters, 2008.    582(8): p. 1237-1244.-   33. Terreno, E., et al., Challenges for Molecular Magnetic Resonance    Imaging. Chemical Reviews, 2010. 110(5): p. 3019-3042.-   34. Cunningham, C. H., et al., Positive contrast magnetic resonance    imaging of cells labeled with magnetic nanoparticles. Magnetic    Resonance in Medicine, 2005. 53(5): p. 999-1005.-   35. Foucault, M.-L., et al., In vivo bioluminescence imaging for the    study of intestinal colonization by Escherichia coli in mice.    Applied and environmental microbiology, 2010. 76(1): p. 264-274.-   36. Daniel, C., et al., Bioluminescence imaging study of spatial and    temporal persistence of Lactobacillus plantarum and Lactococcus    lactis in living mice. Applied and environmental microbiology, 2013.    79(4): p. 1086-1094.-   37. Chu, J., et al., A bright cyan-excitable orange fluorescent    protein facilitates dual-emission microscopy and enhances    bioluminescence imaging in vivo. Nat Biotech, 2016. 34(7): p.    760-767.-   38. Smith-Bindman, R., et al., Use of diagnostic imaging studies and    associated radiation exposure for patients enrolled in large    integrated health care systems, 1996-2010. JAMA, 2012. 307(22): p.    2400-9.-   39. Foster, F. S., et al., Advances in ultrasound biomicroscopy.    Ultrasound in medicine & biology, 2000. 26(1): p. 1-27.-   40. Foster, F. S., et al., Principles and applications of ultrasound    backscatter microscopy. Ultrasonics, Ferroelectrics and Frequency    Control, IEEE Transactions on, 1993. 40(5): p. 608-617.-   41. Errico, C., et al., Ultrafast ultrasound localization microscopy    for deep super-resolution vascular imaging. Nature, 2015.    527(7579): p. 499-502.-   42. Szymczak, A. L. and D. A. A. Vignali, Development of 2A    peptide-based strategies in the design of multicistronic vectors.    Expert Opinion on Biological Therapy, 2005. 5(5): p. 627-638.-   43. Farhadi, A., et al., Recombinantly Expressed Gas Vesicles as    Nanoscale Contrast Agents for Ultrasound and Hyperpolarized MRI.    AIChE J, 2018. 64(8): p. 2927-2933.-   44. Lakshmanan, A., et al., Molecular Engineering of Acoustic    Protein Nanostructures. ACS Nano, 2016. 10(8): p. 7314-22.-   45. Maresca, D., et al., Nonlinear ultrasound imaging of nanoscale    acoustic biomolecules. Appl Phys Lett, 2017. 110(7): p. 073704.

1. A Gas Vesicle Expression System (GVES) configured for expression, ina mammalian cell, of a gene cluster of gyp genes (GVGC) encoding GVproteins capable of forming a GV type, the Gas Vesicle Expression Systemcomprising: a GVPA/B gene expression cassette comprising a gvpA or agvpB gene under control of a mammalian promoter and additional mammalianregulatory regions in a configuration allowing expression of the gvpA orgvpB protein in the mammalian cell; and one or more additional gyp geneexpression cassettes comprising the gyp genes of the GV gene clusterother than the gvpA and gvpB, under control of a mammalian promoter andadditional regulatory regions in a configuration allowing expression ofthe GV proteins other than the gvpA and gvpB in the mammalian cell,wherein, each of the one or more additional gyp gene expressioncassette, when comprising two or more gyp genes, further comprises aseparation element between the two or more gyp genes configured toprovide a separate expression of the corresponding GV protein; andwherein, the GVPB cassette and the one or more additional GVP cassettesare operably linked by regulatory sequences allowing co-expression ofthe GV proteins and formation of the GV type in the mammalian cell. 2.The Gas Vesicle Expression System of claim 1, comprising the GVPA/B geneexpression cassette and a single additional gyp gene expression cassettecomprising the gyp genes of the GV gene cluster other than the gvpA andgvpB.
 3. The Gas Vesicle Expression System of claim 1, wherein GVPA/Bgene expression cassette and one or more additional gyp gene expressioncassettes are within a same polynucleotide construct.
 4. The Gas VesicleExpression System of claim 1, wherein GVPA/B gene expression cassetteand one or more additional gyp gene expression cassettes are on at leasttwo separate polynucleotide constructs.
 5. The Gas Vesicle ExpressionSystem of claim 4, wherein the one additional gyp gene expressioncassette is a single additional GVP expression cassette on a Gas VesiclePolynucleotide Construct (GVPC).
 6. The Gas Vesicle Expression system ofclaim 1, wherein the gene cluster of gyp genes (GVGC) is a naturallyoccurring gas vesicle gene cluster, or an engineered gas vesicle genecluster.
 7. The Gas Vesicle Expression system of claim 1, wherein thegene cluster of gyp genes (GVGC) comprises at least gvpF, gvpG, gvpL,gvpS, gvpK, gvpJ, and gvpU.
 8. The Gas Vesicle Expression system ofclaim 1, wherein the gene cluster of gyp genes (GVGC) comprises a gvpN.9. The Gas Vesicle Expression system of claim 1, wherein the gas vesiclegene cluster comprises gyp genes from B. megaterium and/or Anabaenaflos-aquae.
 10. The Gas Vesicle Expression system of claim 9, whereinthe gas vesicle gene cluster comprises gvpB, gvpN gvpF, gvpG, gvpL gvpS,gvpK, gvpJ, and gvpU from B. megaterium.
 11. The Gas Vesicle Expressionsystem of claim 9, wherein the gas vesicle gene cluster is gvpA, gvpC,gvpN, gvpJ, gvpK, gvpF, gvpG, gvpV, gvpW from Anabaena flos-aquae. 12.The Gas Vesicle Expression system of claim 9, wherein the gas vesiclegene cluster is a hybrid gas vesicle gene cluster comprising gvpR, gvpN,gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ, gvpT and gvpU from B. megaterium andgvpA gene from Anabaena flos-aquae.
 13. The Gas Vesicle Expressionsystem of claim 9, wherein the gas vesicle gene cluster is a hybrid gasvesicle gene cluster comprising -gvpA, and gvpC from Anabaenaflos-aquae, and gvpN, gvpF, gvpG, gvpL, gvpS, gvpK, gvpJ, and gvpU fromB. megaterium.
 14. The Gas Vesicle Expression system of claim 9, whereinthe gas vesicle gene cluster is a hybrid gas vesicle gene clustercomprising -gvpA, gvpC and gvpN from Anabaena flos-aquae, gvpF, gvpG,gvpL, gvpS, gvpK, gvpJ, and gvpU from B. megaterium.
 15. A geneticallyengineered mammalian Gas Vesicle Reporting molecular component (GVRMC),the gas vesicle reporting molecular component comprising the Gas Vesicleexpression system (GVES) of claim 1 in which the mammalian regulatoryregions comprise a gas vesicle reporting (GVR) target region configuredto be activated and/or inhibited by a molecular component of a geneticcircuit; wherein the gyp genes and mammalian regulatory regions are in aconfiguration allowing expression of the gyp genes through activationand/or inhibition of the gas vesicle reporting (GVR) target region, whenthe genetic circuit operates according to the circuit design in themammalian cell.
 16. A genetically engineered gas vesicle reporting (GVR)genetic circuit (GVRGC) configured for expression in a mammalian cell inwhich molecular components are connected one to another in a mammaliancell in accordance with a circuit design by activating, inhibiting,binding or converting reactions to form a fully connected network ofinteracting components, the GVR genetic circuit comprises a mammalianGas Vesicle Reporting Molecular Component (GVRMC) of claim 15 in aconfiguration in which the GV proteins are expressed and a gas vesicle(GV) type is provided when the genetic circuit operates according to thecircuit design.
 17. A method to express a Gas Vesicle in a mammaliancell, the method comprising introducing into the mammalian cell agenetically engineered Gas Vesicle expression system (GVES) of claim 1,for a time and under condition to allow expression of the gyp genes andproduction of the Gas vesicle type in the mammalian cell.
 18. Agenetically engineered isolated mammalian cell comprising the GasVesicle expression system (GVES) of claim 1, configured for expressionin the genetically engineered mammalian cell.
 19. A method to provide agas vesicle in a mammalian host comprising introducing into a cell ofthe mammalian host the genetically engineered Gas Vesicle expressionsystem (GVES) of claim 1, the introducing performed for a time and undercondition to allow expression of the GV proteins and the production ofthe Gas vesicle type in the mammalian cell.
 20. A genetically engineerednon-human mammalian host comprising the Gas Vesicle expression system(GVES) of claim 1, configured for expression in a mammalian cell of thegenetically engineered non-human mammalian host.
 21. A method and systemto provide a genetically engineered mammalian cell comprising a GVRgenetic circuit, the method comprising: genetically engineering themammalian cell to introduce into the mammalian cell one or moregenetically engineered Gas Vesicle Reporting Molecular Components(GVRMC) of claim 15 comprising a gas vesicle reporting (GVR) targetregion configured to be activated and/or inhibited by a molecularcomponent of the GVR genetic circuit, to provide a Gas Vesicle ReportingGenetic Circuit (GVRGC).
 22. A method is described to image abiochemical event in a mammalian cell comprised in an imaging targetsite, the method comprising: introducing into the mammalian cell a GasVesicle Reporting Molecular Component (GVRMC) of claim 15 to provide aGVR genetic circuit in which an expression of the GV type or anintracellular spatial translocation of the GV type occurs when the GVRgenetic circuit operates according to the circuit design in response tothe biochemical event, the introducing performed for a time and underconditions allowing expression of the GV protein and production of theGV type or an intracellular spatial translocation of the GV type inresponse to the biochemical event; and imaging the target sitecomprising the mammalian host by applying a magnetic field and/orultrasound to obtain an MRI and/or an ultrasound image of the targetsite.
 23. A method to label a target mammalian host, the methodcomprising: introducing into the mammalian cell a Gas Vesicle ReportingMolecular Component (GVRMC) of claim 15 to provide a GVR genetic circuitin which an expression of the GV type or an intracellular spatialtranslocation of the GV type occurs when the GVR genetic circuitoperates according to the circuit design in response to a triggermolecular component; wherein, the introducing is performed underconditions resulting in presence of the trigger molecular component inthe target mammalian host.
 24. A composition comprising the Gas Vesicleexpression system (GVES) of claim 1 herein together with a suitablevehicle.